Login  |  Join Us  |  Subscribe to Newsletter
Login to View News Feed and Manage Profile
☰
Login
Join Us
Login to View News Feed and Manage Profile
Agency
Agency
  • Home
  • Information
    • Discussion
    • Articles
    • Whitepapers
    • Use Cases
    • News
    • Contributors
    • Subscribe to Newsletter
  • Courses
    • Data Science & Analytics
    • Statistics and Related Courses
    • Online Data Science Courses
  • Prodigy
    • Prodigy Login
    • Prodigy Find Out More
    • Prodigy Free Services
    • Prodigy Feedback
    • Prodigy T&Cs
  • Awards
    • Contributors Competition
    • Data Science Writer Of The Year
  • Membership
    • Individual
    • Organisational
    • University
    • Associate
    • Affiliate
    • Benefits
    • Membership Fees
    • Join Us
  • Consultancy
    • Professional Services
    • Project Methodology
    • Unlock Your Data
    • Advanced Analytics
  • Resources
    • Big Data Resources
    • Technology Resources
    • Speakers
    • Data Science Jobs Board
    • Member CVs
  • About
    • Contact
    • Data Science Foundation
    • Steering Group
    • Professional Standards
    • Government And Industry
    • Sponsors
    • Supporter
    • Application Form
    • Education
    • Legal Notice
    • Privacy
    • Sitemap
  • Home
  • Information
    • Discussion
    • Articles
    • Whitepapers
    • Use Cases
    • News
    • Contributors
  • Courses
    • Data Science & Analytics
    • Statistics and Related Courses
    • Online Data Science Courses
  • Prodigy
    • Prodigy Login
    • Prodigy Find Out More
    • Prodigy Free Services
    • Prodigy Feedback
    • Prodigy T&Cs
  • Awards
    • Contributors Competition
    • Data Science Writer
  • Membership
    • Individual
    • Organisational
    • University
    • Associate
    • Affiliate
    • Benefits
    • Membership Fees
    • Join Us
  • Consultancy
    • Professional Services
    • Project Methodology
    • Unlock Your Data
    • Advanced Analytics
  • Resources
    • Big Data Resources
    • Technology Resources
    • Speakers
    • Data Science Jobs Board
    • Member CVs
  • About
    • Contact
    • Data Science Foundation
    • Steering Group
    • Professional Standards
    • Government And Industry
    • Sponsors
    • Supporter
    • Application Form
    • Education
    • Legal Notice
    • Privacy
    • Sitemap
  • Subscribe to Newsletter

Quantitative Big Data Analysis Limitations: When Numbers Fail to Tell the Full Story!

A DSF Whitepaper
15 March 2020
Michael Baron
Author Profile
Other Articles
Follow (7)

Share with your network:

It’s no secret that Data Analytics is significantly easier to handle when it focuses on quantitative data analysis methods rather than qualitative ones. When dealing with Big Data, challenges of the qualitative data collection and mining approaches are becoming particularly transparent.  The Qualitative Data Analysis methods and tools tend to be expensive and complex to implement, while accuracy of the analysis may nevertheless be compromised by a wide range of factors (data validity, data interpretation, context interpretation etc.). However, Quantitative analysis is no panacea from mistakes and discrepancies. The purpose of this White Paper is to consider contemporary challenges of the Quantitative Big Data analysis activities.

Quantitative Analysis is usually defined as analysis by the means of ‘’complex mathematical and statistical modelling’’. Therefore, this definition looks way beyond trivial number-crunching and also incorporates mining data sets for patterns and correlations. In other words, it can be used for any project big or small as long as the data can be represented via numerical values. Some of the Advantages of using Quantitative Analysis Methods in Data Analytics are obvious. They are not only cost effective and generally easier to implement as opposed to the qualitative methods and tools, but also tend to produce clear output that appears to be easy to validate (by using established analysis tools and processes where all the steps can be confirmed), automate (AI systems are particularly good at number-crunching) and classify (classifying ‘’numbers’’ is easier than classifying non-numerical values). However, the more we get ourselves engaged into the world of the Big Data, the greater shadows of concern emerge about even some of the most reliable Quantitative Methods.

The critical limitations to consider prior to employment of Quantitative Analysis Methods for Big Data analysis are:

  • Complexity of the Big Data Environments
  • Big Data Life Cycle Sensitivity
  • Data Labelling
  • Deregulation of the Big Data Standards
  • Interdependency of the Data Values

 

Complexity of the Big Data Environments

Quantitative analysis methods work best in single-format environments. A single-format environment ensures that the data values are consistent and well-defined rather than ‘’open to interpretation’’. However, it is the very complexity that makes the data ‘’Big’’. Even if the values are compatible, analysts should also consider how these values were collected, additional factors that impact the data environment (those will vary depending on the data formats) and consistency of the data collection tools and methods used.

 

Big Data Life Cycle Sensitivity

Data life-cycles are getting shorter and shorter. On top of the generic principles of the diminishing value of data (e.g. 90/90), There are also significant concerns on bringing current and historic Big Data to a common denominator. Furthermore, the data environments (as evident from the discussion above) are changing very fast and so do the consequent data values. The number of data parameters also keeps increasing. Traditionally, number of the quantitative data parameters was usually limited to 8-10 even when dealing with the most complex industries cases (financial data etc.). Today, many of the Big Data Analytics projects require creation of a far greater range of the parameters.

Once the data becomes outdated, it has to be removed from the respectful data set. Sometimes, the entire data sets have to be removed from the analytics environment. However, removing entire data sets is not as much of a logistics challenge as going through all of the data sets and religiously reviewing  all of the data that may/many not fit under the newly emerging parameters/requirements. It may turn out that it is difficult to do so consistently for all of the data sets as in cases of the Big Data, each of the data sets may be having own initial formatting (prior to being converted into a common shared format).

 

Data Labelling

Data Labelling makes data sorting easier and more consistent, particularly when dealing with multiple data sets. But what should those labels be? Should the matching data be grouped on the basis of matching ALL of the parameters? Or a single parameter? Or should the grouping be data set-based?

In other words, the larger and the more complex the data environment is, the harder it is to handle the labelling accurately. Studies that use generic labelling as the basis for the data processing should be taken with a grain of salt. Generic labels make data sorting easier but significantly less accurate. It can be compared to putting all of the poultry products on to a single shelf and assuming that other than all of the items being classified as ‘’poultry’’, no further sorting is required!  

 

Deregulation of the Big Data Standards

Increasing complexity of the data sets (common trend with Big Data Analytics) also leads to deregulation of the Big Data Standards. From a technical angle, Analysts would love to have consistent standards to follow as it would make the entire data analytics process more consistent. Many efforts have been made to develop such standards to be shared across industries/projects. For example, the IEEE has come up with some Big Data Standards. At first glance, at least some of the standards appear to be quite comprehensive (some others are still ‘’under development’’). However, it turns out that following those standards and applying them consistently throughout the analytics projects is not always possible. Furthermore, for the standards to become a norm, they have to first become accepted across the globe. Given that this is private companies/industries domain rather than a governments’ one – it is not going to happen in foreseeable future.

Even within a single organisation, implementation of consistent Big Data Standards requires continuous monitoring. This can rarely be achieved through automation alone. As the author has pointed out in his recent Data Science White Paper on Data Processing Automation Challenges, automated monitoring is often unreliable and ongoing ‘’manual’’ reviews of the Big Data Standards (along with the consequent testing/monitoring how these standards are being followed) will certainly increase complexity of the Big Data projects.

 

Interdependency of the Data Values

Interdependency of the Data Values refers to scenarios where changes to a single data value is going to impact other value(s) considered. The Data Value changes may even cause a ‘’Chain Reaction’’. It is particularly important to note Implicit chain reactions where interdependency between the values is indirect rather than direct.

 

All it takes for a quantitative Big Data Analysis to be compromised is a single value inaccuracy/unsolicited or unaccounted change – and the Butterfly Effect will take place. The Bigger the Data is (and with some projects, data comes in 100s of formats), the higher the probability of such discrepancies is going to be.  Understanding the interdependency is therefore one of the keys to being able to analyse the data successfully, but then again – errors may still happen!

 

If Numbers Fail to tell the Full Story, Who Can?

Based on the discussion of the Quantitative Big Data Analysis limitations above, it is easy to start wondering whether the quantitative analysis methods do work?

They certainly do! Limitations and challenges are no reason not to employ the quantitative Big Data Analysis methods but based on the authors’ recent experiences with technology-driven Quantitative Data Analytics projects – it is always good to supplement testing and validation procedures with a Qualitative touch. It is the balancing of the Data Analysis methods that secures success of the DA projects.

It should also be noted that to a large extent, the problems occur NOT because of the limitations of the quantitative data analysis methods but because of our failure to employ these methods properly. Before we question the methods, we should ask ourselves: ‘’Do we understand the Data Analysis process that we are getting engaged into clearly or do we simply rely on the tools to do the job?’’ Usually, Quantitative Data Analysis studies incorporate several stages. With each of the stages completed, we must review ‘’work in progress’’ and validate it prior to moving on. It is the Butterfly Effect that compromises validity of the analysis most! Making errors is inevitable and having at least some discrepancies throughout the data analysis processes is unavoidable. It is our ability to fix those mistakes that makes the Data Analysis projects a success!

 

Rate this Whitepaper
Rate 1 - 10 by clicking on a star
(7 Ratings) (3 Comments) (1441 Views)
Download

If you found this Whitepaper interesting, why not review the other Whitepapers in our archive.

Login to Comment and Rate

Email a PDF Whitepaper

Comments:

Abhishek Mishra

19 Apr 2020 05:27:15 PM

Great nice written. I liked the below section:

Complexity of the Big Data Environments

Big Data Life Cycle Sensitivity

Data Labelling

Deregulation of the Big Data Standards

Interdependency of the Data Values

Sureshkumar Sundaram

20 Apr 2020 04:40:47 PM

Above article is really good and Its very interesting topics. i need more articles on big data research areas. If u have many papers kindly send me. Thanks you

Abhishek Mishra

25 Apr 2020 10:44:49 AM

Complexity of the Big Data Environments - A challenge for some startups, any thought what is the optimal solution for them


Go to discussion page

Categories

  • Data Science
  • Data Security
  • Analytics
  • Machine Learning
  • Artificial Intelligence
  • Robotics
  • Visualisation
  • Internet of Things
  • People & Leadership Skills
  • Other Topics
  • Top Active Contributors
  • Balakrishnan Subramanian
  • Abhishek Mishra
  • Mayank Tripathi
  • Michael Baron
  • Santosh Kumar
  • Recent Posts
  • AN ADAPTIVE MODEL FOR RUNWAY DETECTION AND LOCALIZATION IN UNMANNED AERIAL VEHICLE
    12 November 2021
  • Deep Learning
    05 November 2021
  • Machine Learning
    05 November 2021
  • Data is a New oil : A step into WSN enabled IoT and security
    26 October 2021
  • Highest Rated Posts
  • Graph Analytics and Big Data
  • Piecewise hazard model for under-five child mortality
  • The transformational shift in educational outcomes in London 2003 to 2013: the contribution of local authorities
  • TOP 10 BEST FREE AND OPEN SOURCE BACKUP SOLUTIONS
  • DEEP LEARNING: FIGHTING COVID-19 WITH NEURAL NETWORKS
To attach files from your computer

    Comment

    You cannot reply to your own comment or question. You can respond to another member's comment in this thread.

    Get in touch

     

    Subscribe to latest Data science Foundation news

    I have read and agree to the Data science Foundation Privacy Policy

    • Home
    • Information
    • Resources
    • Membership
    • Services
    • Legal
    • Privacy
    • Site Map
    • Contact

    © 2022 Data science Foundation. All rights reserved. Data S.F. Limited 09624670

    Site By-Peppersack

    We use cookies

    Cookie Information

    We are using cookies to provide statistics that help us to improve your experience of our site. You can choose to use the site without cookies. However, by continuing to use the site without changing your settings, you are agreeing to our use of cookies.

    Contact Form

    This member is participating in the Prodigy programme. This message will be directed to Prodigy Admin the Prodigy Programme manager. Find out more about Prodigy

    Complete your membership listing and tell others about your interests, experience and qualifications with a Personal Profile page.

    Add a Personal Profile

    Your Personal Profile page is missing information about your experience and qualifications that other members would find interesting. Click here to update.

    Login / Join Us

    Login to your membership account to view your personalised news feed, update your profile, manage your preferences. publish articles and to create a following.

    If you are not a member but work with or have an interest in Data Science, Machine Learning and Artificial Intelligence, join us today.

    Login | Join Us

    Support the work of the Data Science Foundation

    Help to fund our work and enable us to provide free communications and knowledge sharing services to members across the globe.

    Click here to set-up a donation of £30 per year

    Follow

    Login

    Login to follow this member

    Login