AI startups to watch: Meet the hottest machine learning startups in the UK
The year 2016 saw a host of tech giants acquire AI startups, and 2017 has followed a similar pattern. Apple, Intel, Twitter and Microsoft have all spent large sums to bring artificial intelligence startups, and their expertise, in-house.Four of the biggest AI startup acquisitions of the last five years have come from the UK, starting with Google's purchase of DeepMind in 2014 for a reported £400 million. Since then Apple has purchased Cambridge-based natural language processing specialists VocalIQ, Microsoft has bought the machine learning powered keyboard SwiftKey, and Twitter has acquired Entrepreneur First alumni Magic Ponyhttps://www.techworld.com/picture-gallery/startups/uk-ai-startups-watch-hottest-machine-learning-startups-in-uk-3645606/
Kaggle have just launched a set of free resources for learning Machine Learning, R, Data Visualisation and Deep Learning. They look like an ideal introduction for anyone wanting to get a start in these disciplines. Take a look here: https://www.kaggle.com/learn/overview
Machine learning solves cognitive gear problem of a Recruiter
Kate had been a busy recruiter and still she could not deliver her job pretty well. Business was not happy with her pace of recruitment and had been complaining that the delivery had slowed down due to the pace of recruitment. Kate had been working hard but world had moved ahead. Kate scans the resume and marks it for interview or selection. There are three issues in any job. A) Time to deliver. B) Context Switching and C) Complexity. Kate does a job of low complexity and high context switching. She has the time pressure to deliver. The solution to Kate problem looking at the changing context often from one resume to another resume looking at a large volume of data and concluding is going to come from a digital Assistant. So what is this Digital Assistant? A bot that can scan the resume find out quickly all the keywords and match the job descriptions , preferences of the hiring managers, get the social media handles and read thru them and gives a qualitative output is what Kate is looking as a solution. We may probably need a machine learning solution for Kate. Let us define Machine learning here. Machine learning is a program that acts on a class of Tasks T and gives a Performance P using the Experience E and this Performance P improves with the Experience E. Let us check what is the Task Kate had been doing? Kate had been classifying the resume and matching the job descriptions ie Task T. Her Performance P is the no of Resumes she is able to qualify a day with her Experience E. Machine Learning has two types of learnings namely Supervised Machine Learning and Unsupervised Machine Learning. Supervised Machine Learning is the one where you have given a label to learn. To give a simple example I teach machine the shape of Circle and if I give, any image that resembles Circle it is able to classify them as Circle. Similarly, if I need to teach the Machine about a resume and an output, which matches the job, description then I have to use a classifier algorithm. So a supervised machine learning classifies a given input in accordance with the output (label) it had been trained. So we have the input -> (Resume), Output (To be qualified for a specific job description). The first step is to train the machine by giving the input to a classifier algorithm and store the output as known to us. What is Training? You have a set of documents and say these documents belong to a certain topic. How you train and represent? The first step is to convert the document into a vector model. You have hundreds of resume and assume that each resume contains 1000 words you convert each resume into a vector model using TF-IDF algorithm. TF-IDF represents Term Frequency – Inverse Document Frequency a term often used in the information retrieval and text retrieval. Feed these documents into the algorithm and convert them into a vector model. The sentences as well as words in the document are stored in a vector model and you can store the same in the MongoDB. Each Vector Model represents a classified output say the job description. Say the Resume belongs to a Business Analyst. There is a word vector model for the sentences that qualify business analyst. The algorithm converted every resume into a vector and an output classification defining the job. Once you have a trained Engine for the resume reading and giving an accurate output then the next step is to input a new resume and based on the Experience E the classification will give a set of output as it had been trained. So what did the solution achieve? The performance P increased over a set of Tasks T with the experience E – So the Machine Learning made the life of people like Kate much easier.
Pandas Cookbook:Develop Powerful Routines for Exploring Real World Datasets
Pandas Cookbook - Develop Powerful Routines for Exploring Real-word Datasets Exploring Real-World Datasets My name is Ted Petrou , and I am the author of the newly released Pandas Cookbook. In this article, I will discuss the overall approach I took to writing Pandas Cookbook along with highlights of each chapter. Pandas Cookbook Guiding Principles I had three main guiding principles when writing the book: Use of real-world datasets Focus on doing data analysis Writing modern, idiomatic pandas First, I wanted you, the reader, to explore real-world datasets and not randomly generated data. I tried very hard to find datasets that contained situations where an interesting or unique pandas operation could be performed. Descriptions of the main datasets used throughout the book can be found in this Jupyter notebook. Second, I wanted to focus on doing actual data analysis by providing useful or surprising insights. I wanted to avoid a mechanical approach where pandas operations where learned in isolation or were devoid of contact with real data. In this regard, Pandas Cookbook teaches both, how to understand pandas operations, and how to generate results that would be useful for a data analysis. Third, the pandas library has evolved quite substantially since it first started to make regular appearances in data analysis workflows in 2012. Many of the older tutorials, and especially the older answers on Stack Overflow have not been upgraded to reflect newer syntax. Pandas is confusing because there are often multiple ways to produce the same result, many of which, will be slow or ineffective. Pandas Cookbook strives to provide straightforward and efficient or ‘idiomatic’ pandas. Formation of Pandas Cookbook Pandas Cookbook was inspired by the following: My weeklong Data Exploration Bootcamp Answering 400+ questions on pandas on Stack Overflow Working as a data scientist at Schlumberger Hosting dozens of meetups for Houston Data Science My Data Exploration Bootcamp is an intensive, weeklong class with over 700 pages of material, 250 short-answer questions and a couple projects. Much of the material for Pandas Cookbook was inspired by this class. The material was expanded and refined each time the class was taught, thanks in part to the excellent feedback from my students. Teaching showed me first hand, exactly where the greatest pain-points were. Nothing helped me more to improve my own ability to write idiomatic pandas than answering questions on Stack Overflow. You learn an incredible amount by answering and discussing with the other top users. As a data scientist at Schlumberger, I built scripts to clean and process data that required the use of dozens of pandas commands pieced together. Pandas Cookbook has many advanced recipes that combine operations from different parts of the library to get the required result. Also, I was given a week’s worth of professional Python training, which was quite bad and sparked a desire to produce a better class. You can hear more of my story in this podcast from Undersampled Radio. A Book must Beat the Documentation The official documentation itself is very thorough and over 2,000 pages in total. In order for a book to be of any value, at a minimum, it must be better than the documentation. There are some major advantages of the documentation over a book. First, there is no restriction on page length, so every single aspect of the library can be covered. Second, the documentation is always up to date with the latest changes. Technical books on fast moving libraries like pandas tend to go out of date relatively fast. How Pandas Cookbook Demolishes the Documentation Unfortunately, the pandas documentation does not have interesting examples using real-world datasets. Nearly all of its examples are done using randomly generated or contrived data showing operations in isolation from one another. You learn how to run a single command, independent of all the other available ones. This is not at all how an analysis happens using actual data. There is certainly lots of value for learning the mechanics of all the pandas operations and I suggest doing that in my How to Learn Pandas article. In fact, I have read through most parts of the documentation five or more times each. Pandas is a huge library and its difficult to keep all the commands in the forefront of your mind, even if you use it every single day. Pandas Cookbook uses multiple operations one after the other in many of its recipes. This often yields a long chain of methods called from a DataFrame or Series. This is what makes Pandas Cookbook valuable — you are constantly working with real data, stringing together multiple pandas operations to complete a particular task. Cookbook Format It’s a bit unfortunate/ludicrous that the title of the book sounds appalling for those not in the know. I suggest keeping this book in the kitchen next to your other cookbooks for some guaranteed extra laughs. The book is composed of approximately 100 recipes, with each one containing three major sections: How to do it: Step-by-step code on how to complete a particular task. There are some explanations embedded into the steps themselves. How it works: Very detailed explanations of all the steps in the recipe. I read lots of reviews of other Packt cookbooks, and the most common complaint was the lack of explanations in this section. I took extra care to ensure that all steps and commands were fully explained. There’s more: Extra operations, closely related to the main recipe. There are almost always tangents that you can follow when learning pandas. This section is often equivalent to an entire new recipe. Entire Focus on Pandas This book makes one basic assumption — that you are comfortable with the fundamentals of Python. Every single recipe (except one or two), uses pandas. Thus, the scope of the book is a bit narrower than other similar books, in that it only focuses on doing data analysis with pandas (along with matplotlib and seaborn for visualization). Target Audience There is no hard requirement for having any prior exposure to pandas. The recipes range from very simple to advanced, so the book is suitable for novices as well as experienced pandas users. Getting the Most out of Pandas Cookbook To get the most out of Pandas Cookbook, I suggest doing the following: Keep the official documentation open at all times Run the code in the Jupyter notebooks as you read the book Read the book sequentially, cover to cover Pandas Cookbook strives hard to differentiate itself from the documentation. This doesn’t mean it is a replacement for the documentation. Most recipes link to a specific part of the documentation, where you can get more details on a specific command. This is why I recommend keeping the documentation open as you progress through the book. Do not just read the book. Run the code as you read through each recipe. You should be doing lots of exploration and formulating questions on your own. I also recommend reading this book sequentially. I recommend this whether you are a novice or an experienced pandas user. The recipes have a natural flow that progress from one to the next and tend to get more and more complex. More experienced users, of course, can skip around to recipes that appeal to them more. But, I’ve found that, unless you are a power user of pandas, it will still be good to drill the fundamentals, which is done by reading the book sequentially. Chapter Highlights Below, I discuss a few of the more important concepts and recipes of each chapter Chapter 1: Pandas Foundations Chapter 1 begins by dissecting the anatomy of the DataFrame and Series, the primary objects that will handle the bulk of your workload. It’s vital to be aware of the DataFrame components — the index, the columns and the data (values). The chapter continues by selecting a single column from a DataFrame as a Series. We use this Series to learn about method chaining which is an extremely common way to use pandas. The majority of the recipes in the book string together multiple methods in succession like this. Chapter 2: Essential DataFrame Operations Chapter 2 focuses entirely on the DataFrame. We learn how to order columns sensibly, which is a commonly overlooked task and can greatly help improve readability of the data. As a practical and fun example, we determine the diversity of college campuses by using many of the concepts covered up to this point. Chapter 3: Beginning Data Analysis Chapter 3 covers several fairly simple but complete tasks that you might do when first starting an analysis. It can be immensely helpful to establish a routine at the beginning of a data analysis. Another recipe finds the largest/smallest value in column ‘x’ for every unique value in column ‘y’ without using a call to the groupby method. This is an example of one popular idiom that has arisen more recently. Chapter 4: Selecting Subsets of Data Chapter 4 selects subsets of DataFrames and Series in just about every way imaginable. Data selection is one of the most confusing aspects of the library, which is unfortunate, as it’s used very frequently. Pandas is partially to blame here, as indexing changed with the addition of the .loc/.iloc indexers along with the recent deprecation of .ix. Chapter 5: Boolean Indexing Chapter 5 covers boolean indexing, which is used to select subsets of data by the actual content of the columns and not by their label or integer location(as in chapter 4). One common theme throughout Pandas Cookbook is the comparison between different methods that produce the same results. In one recipe in this chapter, we show how boolean indexing can be replicated by placing columns into the index. For those familiar with SQL, boolean indexing is also compared to the WHERE clause. Chapter 6: Index Alignment All of chapter 6 is dedicated to one of the most powerful, but unexpected, feature of pandas, automatic alignment of each index. Some users can spend years using pandas without even understanding this concept. Automatic index alignment is what separates pandas from most other data analysis libraries. An absurd example is the ‘Exploding Indexes’ recipe, which is used to hammer-home exactly what happens when combining multiple pandas objects. Chapter 7: Grouping for Aggregation, Filtration, and Transformation The first 6 chapters cover the most fundamental parts of pandas in 200 pages. The remaining 5 chapters, and 300 pages, use these fundamentals in just about every recipe to do more complex and interesting analysis. The groupby method in this chapter is particularly helpful for splitting data into independent groups. One particularly fun recipe uses the transform method to calculate the results of a weight-loss bet. Also, one of the most complex recipes resides in this chapter, and finds the streaks of on-time flights for each airline. Chapter 8: Restructuring Data into a Tidy Form Data analysis is made easier when you have tidy data , a term popularized by Hadley Wickham. Chapter 8 transforms many different formats of messy data into tidy data with the following methods: stack, unstack, melt, and pivot. You will also be exposed to the str accessor, which is used to rip apart string data to extract new variables. Chapter 8 is probably the most unique chapter in this book, as I have not seen much discussion online on how to tidy the vast assortment of datasets as done in this chapter. Chapter 9: Combining Pandas Objects There are four primary methods/functions that are used to combine DataFrames/Series together: append, concat, merge and join. This chapter provides examples that are suited for each. “Comparing President Trump’s and Obama’s approval ratings” is one of my favorite recipes, which does intricate web-scraping, moving windows analysis and visualization all in one. This chapter also connects to a relational database with multiple tables to perform an analysis one might normally do with SQL. Chapter 10: Time Series Analysis Pandas has powerful time series functionality that exceeds that from the datetime and NumPy libraries. You will learn how to group simultaneously by time and another variable. Also, one of the newest additions to pandas, the merge_asof function, will be used to find the last time crime was 20% lower. Chapter 11: Visualization with Matplotlib, Pandas, and Seaborn One of the most infuriating and confusing things about matplotlib is its dual interface. In my opinion, all matplotlib should be written with the object-oriented interface as it’s more Pythonic. Pandas Cookbook thoroughly covers how to get started with the object-oriented interface along with the Figure/Axes hierarchy which is key to understanding all of plotting in matplotlib. Pandas and seaborn both use matplotlib to make plots in completely different ways. Pandas uses wide or aggregated data while seaborn takes long or tidy data. One particularly useful recipe for data scientists involves “uncovering Simpson’s paradox”, which is a very common finding that gets revealed whenever you look at more granular slices of your data. Lots More! The chapter highlights are just a small sampling of what is contained in the book. I worked extremely hard to make Pandas Cookbook the very best book available for learning pandas while doing analysis with real-world data. I had lots of fun coming up with the recipes and hope you have fun exploring them.
Great article Ted!
5 Misconceptions About Data Science
In this contributed article, technology writer and blogger Kayla Matthews examines the 5 most common misconceptions floating around about data science and what project administrators and business managers need to be aware of. Remember these tips before getting involved, and be sure to do the necessary research. With the right people and knowledge on your side, you’ll be on your way in no time, rocketing to success.https://insidebigdata.com/category/news-analysis/
Narayana Murthy dismisses artificial intelligence as ‘more hype than reality’
Corporate thought-leader and Infosys co-founder NR Narayana Murthy has flayed the high wage hikes that senior managements have been apportioning to themselves when the software industry is in trying times and has advised them to make “sacrifices” to maintain the common man’s faith in capitalism.http://m.thehindubusinessline.com/info-tech/narayana-murthy-dismisses-artificial-intelligence-as-more-hype-than-reality/article10001743.ece
How to Ace Your Data Science Interview
With dissertation deadlines glooming, Data Science students are gearing up to leave the academic world and find their feet in a data science role. We all know the demand for the skills and the shortage supply of experienced data scientists means there are opportunities everywhere and companies are looking to secure grad talent, so finding some data science jobs should not be too difficult. But before you reach that commercial goldmine, your faced with the job interview. Not matter how much experience and exposure you have in previous interviews, public speaking or Data science discussions, this preparation is still hard. Data Science interviews tend to cover a wide range of topics. From technical exposure, to statistical understanding, to solving and communicating complex business problems. At Eden Smith we work with a number of business in hiring across the Data science spectrum and to support you ace your interview have curated a list of common Data Science interview questions. We have enriched this data with information from online and insight from our Data Science partners to help you prepare for the types of questions that can be thrown at you during your Data Science Interview. Building Models Building data models for machine learning or pure data transformation and analysis is one of the most common tasks of the modern data scientist and more and more businesses are developing teams, particularly with grads, that are modelling and coding heavy, this is resulting in more interviews covering the various modelling techniques and statistical theories. Not all interviews will be technical, but below are some questions that will help you prepare and refamiliarize yourself with. How would you create a logistic regression model? What is linear regression? What do the terms P-value, coefficient, R-Squared value mean? What is the significance of each of these components? Why is Central Limit Theorem important? Explain hash table collisions? In your opinion, which is more important when designing a machine learning model: Model performance? Or model accuracy? What are some situations where a general linear model fails? Is it better to have too many false positives, or too many false negatives? How would you validate a model you created to generate a predictive model of a quantitative outcome variable using multiple regression? What is an example of a dataset with a non-Gaussian distribution? Explain Bayes Theorem, when might you use Bayes Inference? Programming Most Data Science teams are involved in both the ingestion of data for modelling and analysis and the production of models into the enterprise environment. Whether this is led by Data Engineering, software engineering or a database development team, you will be expected to have a strong understanding of various program languages, those directly involved in Data Science and those surrounding data integration and exportation. Be sure to brush up on your Python, R, SQL and relevant big data programming languages such as Scala. Python or R, which would you prefer for text analysis? What modules/libraries/Packages are you most familiar with? What do you like or dislike about them? What are the different types of sorting algorithms available in R language? What is the difference between a tuple and a list in Python? How do you split a continuous variable into different groups/ranks in R? What is the purpose of the group functions in SQL? Give some examples of group functions. Tell me the difference between an inner join, left join/right join, and union. Describe a data science project in which you worked with a substantial programming component. What did you learn from that experience? How would you clean a dataset in your programming preference? What are the two main components of the Hadoop Framework? Data Science Process Although being hands on with data and modelling and programming are the major aspects of any data science in today’s world; often businesses are looking to understand how insights and results are created. Interviewers are looking for you to demonstrate a clear understanding and be able to explain various methods and processes throughout a data science project and their Pros. Cons and use cases to a non-technical audience. Practice articulating and giving clear simple explanations of various complex data science procedures. What are various steps involved in an analytics project? What is the goal of A/B Testing? Explain the use of Combinatorics in data science? What is the difference between Cluster and Systematic Sampling? What is logistic regression? Or State an example when you have used logistic regression recently. Explain false negatives and false positives, which is better to have too many of? What was the business impact of your last project? Can you explain the difference between a Test Set and a Validation Set? What makes a dataset gold standard? What are outliers and inliers? What would you do if you find them in your dataset? General Data Science is still a position that has great variety and a lack of standardisation across the market. Therefore, every Data Science position and company you interview for will take a slightly different approach and expect additional skills and awareness of the surrounding subjects. Be sure to explore the business your interviewing with, check current employees; data scientists and analysts and see what additional products, technologies and soft skills they have experience with. Some common general questions are: What visualisation tools are you familiar with? Explain a time when you had to handle stakeholder’s expectations? Describe a time when you have been innovative and creative? Which Cloud services have you used and how have you interacted with them? What external data sources do you think could be interesting to our domain? Present to us your last data science project. What’s a project you would want to work on at our company? What data would you love to acquire if there were no limitations? How important is the product in Data Science? Eden Smith If you want more advice or support in how to land your dream data science opportunity or if you’re a manager looking to scale a data science team get in touch with us today.
Methods for dealing with missing values in datasets
Methods for dealing with missing values in datasets AlMazloum, Amer Eddin HERIOT WATT Professor: Dr. Hani Ragab Missing Values in Data: Missing data can occur because of nonresponse: no information is provided for one or more items or for a whole unit ("subject"). Some items are more likely to generate a nonresponse than others. Missing data mechanisms: Missing completely at random (MCAR): Suppose variable Y has some missing values. We will say that these values are MCAR if the probability of missing data on Y is unrelated to the value of Y itself or to the values of any other variable in the data set. On another hand, missing value (y) neither depends on x nor y. Missing at random (MAR): The probability of missing data on Y is unrelated to the value of Y after controlling for other variables in the analysis (say X). on another hand, Missing value (y) depends on x, but not y. Not missing at random (NMAR): Missing values do depend on unobserved values. On another hand, the probability of a missing value depends on the variable that is missing. Patterns of Missingness: We can distinguish between two main patterns of missingness. On the one hand, data are missing monotone if we can observe a pattern among the missing values. Note that it may be necessary to reorder variables and/or individuals. On the other hand, data are missing arbitrarily if there is not a way to order the variables to observe a clear pattern (SAS Institute, 2005). Methods for handling missing data: Deletion Methods List wise deletion If a case has missing data for any of the variables, then simply exclude that case from the analysis. It is usually the default in statistical packages. (Briggs et al.,2003). In this case, rows containing missing variables are deleted Pair wise deletion Analysis with all cases in which the variables of interest are present. On another hand, only the missing observations are ignored and analysis is done on variables present. Imputation Methods Popular Averaging Techniques: Mean, median and mode are the most popular averaging techniques, which are used to infer missing values. Approaches ranging from global average for the variable to averages based on groups are usually considered. On simply way Replace missing value with sample mean or mode. Conditional mean imputation: Suppose we are estimating a regression model with multiple independent variables. One of them, X, has missing values. We select those cases with complete information and regress X on all the other independent variables. Then, we use the estimated equation to predict X for those cases it is missing. (Graham, 2009) (Allison, 2001) and (Briggs et al., 2003). Model-Based Methods: Maximum Likelihood: We can use this method to get the variance-covariance matrix for the variables in the model based on all the available data points, and then use the obtained variance- covariance matrix to estimate our regression model (Schafer, 1997). On another hand, Estimate: value that is most likely to have resulted in the observed data. Multiple imputation: The imputed values are draws from a distribution, so they inherently contain some variation. Thus, multiple imputation (MI) solves the limitations of single imputation by introducing an additional form of error based on variation in the parameter estimates across the imputation, which is called “between imputation error”. It replaces each missing item with two or more acceptable values, representing a distribution of possibilities (Allison, 2001). How do you deal with missing values? Ignore or treat them? The answer would depend on the percentage of those missing values in the dataset, the variables affected by missing values, whether those missing values are a part of dependent or the independent variables, etc. Missing Value treatment becomes important since the data insights or the performance of your predictive model could be impacted if the missing values are not appropriately handled. In conclusion: Assumptions and patterns of missingness are used to determine which methods can be used to deal with missing data Sources and useful resources: Reports: http://www.bu.edu/sph/files/2014/05/Marina-tech-report.pdf https://liberalarts.utexas.edu/prc/_files/cs/Missing-Data.pdf References: Allison, P., 2001. Missing data — Quantitative applications in the social sciences. Thousand Oaks, CA: Sage. Vol. 136. Enders, Craig. 2010.Applied Missing Data Analysis. STATA 11, 2009, Multiple Imputation. Stata Corp. Schafer, J. L. ,1997. Analysis of Incomplete Multivariate Data. Useful links: Data sets with missing values that can be downloaded in different formats including SAS, STATA, SPSS and S plus: http://www.ats.ucla.edu/stat/examples/md/default.htm. Introduction to missing data with useful examples in SAS http://www.ats.ucla.edu/stat/sas/modules/missing.htm. Multiple imputation in SAS. Comprehensive explanations http://www.ats.ucla.edu/stat/sas/seminars/missing_data/part1.htm
Very nice review, thanks AmerI have worked once with sparse matrices with missing data, and I think your article is relevantThanksF
Drowning in Data?
Find real value and insights in the intersections between small data and big dataData – a set of facts and statistics collected together for reference (https://en.oxforddictionaries.com).As I talk to many business leaders time and time I hear the same frustrations, “I’m drowning in Data, what I need are real insights to drive my decision-making”.Despite these frustrations and the volumes of data available, I am amazed at the degree to which decisions are still being taken on often a very small number of discreet data points, with businesses often using separate disconnected data sets from different vendors to make decisions.So is data a definitive thing, a trusted source, a fact or figure that can be referenced to reinforce or justify a statement or decision? If so then surely it doesn’t matter the size of the data set, if a necessary fact can be derived? Well, it’s not a Yes or No answer, in fact, its Yes and No!In daily life, when looking at any situation, do we look at just one side of an argument or issue? No, because in open freethinking societies we shift our viewpoints to consider other views. In practice, this means triangulating our understanding with many differing sources and views to get much richer information on which to make well-informed decisions. Therefore, our perspectives and rationales change dependent on varying factors and the information available at hand. Such an approach however requires a common context with as a minimum some aligned data points for consideration.If this is how we naturally make our daily decisions, why should our use of data to inform business decisions be any different? Seeing how we make informed decisions on a personal level, is there any surprise that there is a real desire to get deeper and richer perspectives and therefore real insights from the data sources we use at a corporate level?So with so much data available this should be easy, shouldn’t it? If everyone is drowning in data there must be enough relevant information to provide real insights, right? Sadly the real problem is the complexity to align available data to provide the triangulation and joined up views required. Different definitions, taxonomies ,standards and complex data integration makes finding actionable insights from data sources a real problem.Stepping away from ICT to show a real world example, Let’s use a topical issue right now in the EU, Brexit to make a point. The question remains, how many ill-informed choices were made on both sides of the argument with an inaccurate application of individual non-joined up facts preventing a well rounder argument to support decision making, How many non-fact based subjective views were fuelled by bias and prejudice? Well, the answer can be seen in the general confusion and frustration arising after the post-Brexit vote. Prime Minister Theresa May could easily be one of the business leaders quoted in my opening gambit, “I’m drowning in Data, what I need are real insights to drive my decision-making”.Returning to ICT sourcing intelligence, At Pivotal iQ we believe that value is derived best when we are able to use data like we should use information in daily life, looking through different dimensions of interconnected facts and figures to see different perspectives of a client, contract or opportunity to identify the subtleties behind a situation that will inform a decision. We believe real value is actually in the intersections of data.Let me provide an example of how value can be derived in this way using 3 seemingly unrelated Big Data points:Company A has an outsourcing contract with Company B due for renewal in 12 monthsCompany C has an outsourcing contract due for renewal in 10 monthsCompany D has an outsourcing contract with Company E in 12 monthsWhat we have here in isolation are a number of data points that individually are useful but don’t provide sufficiently rich insights on an opportunity. In fact we could look at each and make many assumptions.However, by building relationships between facts and interconnecting ‘small data’ we can start to build richer insights:Company A has an outsourcing contract with Company B due for renewal in 12 monthsCompany A isn’t very happy with Company B’s delivery performance.Company B just released poor financial resultsCompany B has just partnered with Company CJohn a CTO at company A has traditionally had good relations with Company CA service provider now looking at this opportunity may indeed decide to prioritise this opportunity as the clients dissatisfaction provides an opportunity for displacement. The service provider may also seek to partner with C or factor in this association into their sales strategy with company A.This type of in-depth data when combined produces actionable insights. Indeed, Forbes.com (2013) confirms “Data is meaningless unless it helps make decisions that have measurable impact. Unfortunately, many decision makers are ensnared rather than enlightened by Big Data, preventing data and insights from making it to the front lines in relevant and usable forms”.I recently caught up with a global ICT service provider that used the joined up approach I advocate to build a picture of an international customer’s installed technologies across its many sites. By joining company , spending and installed base data they were able to see across a company’s sites and installations to identify an opportunity for consolidation within the company that the global provider was well placed to fulfill. The positive outcome was a huge order win, made possible by the real insights provided by the ‘small data’ between the ‘big data’ points.At Pivotal iQ, our solution has always been to standardise, building and integrating data sources that allow for cross sectional views of companies, opportunities, installed technologies, transactions and announcements allowing ‘small data views’, Integrating several data facts in this way makes for much richer insights.We believe that What you see depends on what you look for. By combining a Big and Small Data approach, we allow you to see the opportunities others can’t by providing an ability for you to see value and insight in the intersections of data.I urge every business leader to challenge their data approach, to see how it can be improved using the Big and Small Data principals championed by Pivotal iQ to provide the richer sourcing insights they demand.
Datascience Foundation in Big Data London November 2017
It was great to meet a lot of prospective members. Tge curiousity factor was very high. Some Data Science called DSF as LinkedIn of Datascience. This is a good way to look at it.