As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. Data is considered as the most valuable commodity on the globe, far ahead than the crude oil in economy list. The paper is an attempt to provide a complete understanding about the trustworthiness of data in the big data framework. The current article provide the complete insight about the understanding, importance and structure of Big Data. The first of paper will established the need of data and its truthfulness , while the latter part of the article explains big data history, framework, challenges and use case.
As the smart information age matures, data has become the most powerful resource enterprises have at their disposal. The data is consider as the most valuable commodity on the globe, far ahead than the crude oil in economy list. The data is the new oil of the digital era and companies dealing in this. These titans—Alphabet (Google’s parent company), Amazon, Apple, Facebook and Microsoft—look unstoppable. They are the five most valuable listed firms on the earth, (Economist, 7 May 2017). All industries have welcomed this digital transformation, often betting their worth on visions mined from collected data. Data that is not properly used, or unstructured on an inaccessible IT atoll can prove detrimental to the reliability of business process. The question arise about the truthfulness of data, If internal key person of business can’t trust the data of their company, how can external stakeholders know they are in good hands? More than 80% time of a Data professional is spent on data finding, cleansing, understanding and integrating to the business problem.
Business house want to use data as resource at epicenter of their decision-making processes to reduce mistakes and take full advantage of their core competence. The size of data is growing with the higher speed than the growth in area data handling technology. The optimal use of Data inequalities between organizations will become more glaring, moving from competitive edges to critical business advantages. We are talking about the management of huge data flow in and around the businesses, the Big data. In Today’s real business cycle the impact of big data is not only evolving but increasing with rocket speed, as each and every face of it as well as the machines works only on data. The management of data is crucial for success of business in digital era, so the job of Chief Data Officers is not to manage but to maintain the trustworthiness of data in changing and evolving space.
What is data and why we are talking about it!
In more specific term, Data is information for businesses but in a general concept Data is part of daily life. Data was referred as the fact coming from some existing knowledge base to solve critical problems of business before some decades. But now, we are living in ocean of data. As a much advanced digital life people are becoming smart and more digital instead of doing all things manual they prefer to do all digital whether it’s a business places, household work, or even sleeping at night, the whole day to day life covers with data and is not getting stopped instead its getting increased higher and higher. Even our body produce a millions of data every day in form rhythm of heart, movement of blood, amount of calorie burn, movement of brain nervous etc. To understand the importance, let us check a life of an ordinary college going student, his day starts with the wake up alarm from his smart phone, followed by peeping in his social media account, check the class schedule of the day by digital interaction and so on.
Even the wash rooms are full of data producing machines, use of smart shower system works on Daily Information Impact to store the speed and the kind of water we want on a daily basis, automatic temperature control toilet seats, smart geysers and many more. Similarly during travel to work place, we generate gigabyte of data in form of music play, route navigation system, traffic control etc. All these huge data come together and form a big sea of data which can be manage by the concept of big data only.
Our life is revolving around data, a huge amount of data knows as BIG DATA. The big question is to understand and use these data for decision making. Should be trust on all these data? As every single activity of society and business is link to big data, the field is developing hurriedly at a massive rate. The next section will provide detail insight about big data, its effect and handling.
Big Data Understanding
In simple words big data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In today’s digital scenario, commercial and social interactions among people, between people and machines, and among machines produce a constant stream of data to monitor and analyze. Social interactions, mobile devices, facilities, equipment, R&D, simulations, and physical infrastructure all contribute to this endless flow. In aggregate, this explosion in data sources is at the heart of big data. The Big Data technology could include a range of appropriate or fit-for-purpose software and hardware that are able to address the scenarios depicted in figure 1 below.
Figure 1: Big Data Framework
In short, such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
Types of Big Data:
Data which is stored, accessed and processed in the form of fixed format is termed as structured data. Though there are various techniques to handle such kind of data but as big data came into picture these techniques lack in the sizes of volumes of data. One such example of a structured data is the relational database and with its increase in huge number previously techniques such as SQL is also facing issues to handle it as it went to zettabytes of data .
A company employment database:
|Employee ID||Employee Name||Gender||Job Level/Designation||Annual Salary (LPA)|
|125250||Raghav Grover||Male||JL3/System Engineer||3.9|
|75580||Sagar Dingerja||Male||JL4/Senior System Engineer||4.4|
|74963||Sakshi Mittal||Female||JL5/Technology Analyst||5.2|
Any Data with unknown form or any structure is termed as unstructured data. Nowadays big data mainly comprises of unstructured data only as not only its size is huge but to handle it comes with various challenges to get the desired out of it or we say get the appropriate value of it. Combinations of music, videos, text messages etc are all unstructured data. The most or we say biggest example of unstructured data is data by social media such as Whatsapp, instagram, facebook etc. Even the data we searched on google is one type of unstructured data.
Semi- Structured Data:
This type of data contains both forms of data (i.e. Structured and Unstructured). It means the information that does not present in relational database but have some organizational properties that makes easier to analyze. While doing some work on it can be stored as a relational databases but in some cases it’s hard to that. One such example of semi-structured data is a XML page.
A type of a semi-structured data which has some tags and includes the database information in it.
History of Big Data:
Initially Big data came into picture around 1960’s and 1970’s when the world of data was just getting started with the first data centers and the development of the relational database. But that time no one cares about the concept of it.
After that Big Data existed in various forms such as in 1992 Teradata systems were the first to analyze and store 1 terabyte of data. As previously Hard disks are typically of 2.25 GB .Similarly in 2007 they installed first petabyte of relational database system and after that it increases as data increase which needs to be stored and for that large storage devices needed and finally people come to know about big data . As of size methodology: smallest unit of data is bit
1 bit = Single 1 or 0, binary unit
1 Byte = 8 bits
1 Kilobyte = 1024 bytes
1 Megabyte = 1024 KB
1 Gigabyte = 1024 MB
1 Terabyte = 1024 GB
1 Petabyte = 1024 TB
But Big data really becomes talk of town in 2005 after Mark Zuckerberg launched Facebook and realize that how much data is creating out of it and later on by YouTube and other online services.
To Handle or work with this amount of data various techniques and frameworks were also discovered in the year 2004-2005 like Google introduced a map reduced technique to process huge amounts of data, and Hadoop (an open-source framework created specifically to store and analyze big data sets) was developed that same year. NoSQL also began to gain popularity during this time.
The development of open-source frameworks, such as Hadoop (and more recently, Spark) was essential for the growth of big data because they make big data easier to work with and cheaper to store. In the years since then, the volume of big data has skyrocketed. Users are still generating huge amounts of data—but it’s not just humans who are doing it.
While big data has come far, its usefulness is only just beginning. Cloud computing has expanded big data possibilities even further. The cloud offers truly elastic scalability, where developers can simply spin up ad hoc clusters to test a subset of data.
Three Vs of Big Data:
To have a better understanding we can say that Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs.
Volume: The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data(data in the form of photos, videos, messages etc). This can be data of unknown value, such as Twitter data feeds, facebook, daily music etc. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
Velocity: It is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
Variety: This refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semi structured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.
Due to massive increase in the data two additional Vs have emerged that are value and veracity.
Value: Each data has an important value to it but it’s of no use until that value is not discovered or why that data is present that’s is the reason the Value has added to the functionality of big data or we can say that the value determines which to prefer and which not. Think of some of the world’s biggest tech companies. A large part of the value they offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products. Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s an entire discovery process that requires insightful analysts, business users, and executives, who ask the right questions, recognize patterns, make informed assumptions, and predict behavior.
Veracity: As the value holds its importance one aspect is that how truthful is your data is and can we rely on it or not is the veracity of the big data, with the increased volumes big data now cheaper and more accessible the users can make more accurate and precise business decisions and as a result of it the correct planning, and success can be achieved easily.
Big Data Use cases
With the help of big data and the techniques to store/work/handle such data can help you address a range of business activities, from customer services to analytics, from producing a product to earn maximum profit out of it. Some Use cases related to it are mentioned below:
Product Development: Companies like Netflix, Amazon prime use big data to anticipate customer demand. This is very helpful for the end users and even for product owners as these days this kind of business is trending very fast. People browse daily many things like movies, TV-shows, music etc. With the help of big data and its techniques its analyze the user interest and create a cache out of it and gets an idea about the user demand which in turn helps the customer to browse according to their interest rather to think and search for it .
Predictive Maintenance: Factors that can predict mechanical failures may be deeply buried in structured data, such as the year, make, and model of equipment, as well as in unstructured data that covers millions of log entries, sensor data, error messages, and engine temperature. By analyzing these indications of potential issues before the problems happen, organizations can deploy maintenance more cost effectively and maximize parts and equipment uptime so that the efficiency can be increased with fewer failures.
Customer Experience: A clearer view of customer experience is more possible now than ever before. Big data enables you to gather data from social media, web visits, call logs, and other sources to improve the interaction experience and maximize the value delivered. Start delivering personalized offers, reduce customer effort, and handle issues more accurately.
Similarly there are many other use cases which shows that with the help of big data and its techniques the task can become much easier and experience of the end user increases drastically and for the owner they can handle huge sets of information precisely and earn more profit.
Big Data Challenges:
First is the data itself though many big data technologies such as Hadoop have been developed to handle and store it but still with increased data on a massive scale even these face issues and organizations still struggle to keep pace with the data and find a effectively way to store it.
And as of today it’s also not enough to store the data, data must be used to be valuable which depends on curation (selecting relevant out of massive sets). Cleaning of the data or to have the data that is relevant to the end user and organized requires lots and lots work, meaningful analysis. Engineers spend around 70 percent of their time curating and preparing data for their client according to their needs.
Initially Hadoop was developed to handle the big data ,later on Apache spark was introduced to store it but it’s is still face some challenges and issues and Now days combination of both the frameworks i.e. Hadoop and Apache spark are used and are till now most effective technologies to work on big data. Keeping up with big data technology is an ongoing challenge.
How Big Data Works:
As we understand that what big data is, why there is a need to understand big data and let understand how big data works:
Basically to work on big data involves three stages:
Big data brings the data from many discrete sources and applications, Mechanism such as ETL (extract, transform and load) , first we extract the data and then the extracted data is transformed according to the desired requirement of the work the analyst needs to do on that data and finally loading of data is done . This is one strategy and technology and there are various others to analyze big data sets at terabyte, or even petabyte, scale.
During integration part basically bringing of data, processing it and make the data available in a form that business analyst can be started with.
This is the second stage as after loading we need to manage that data so that we can work on it as we know that Big data requires storage, storage can be done in any form and bring the desired requirements and necessary process engines to those data sets on an on-demand basis. Nowadays the best popular storage solution is on cloud as its support all the current compute requirements and enables to use the resources as needed for example Google Cloud platform, Azure
This is the final stage as your investment in big data pays off when you analyze and act on the data and work on it according to the user demands and requirements. Finding new discoveries and analyze the data how it is and what sort of new research the data can give by visual analysis of various data sets and even build something new out of it .This is basically how big data actually works and gives us numerous outstanding results .
Big data framework is becoming need of the day for every organizations, it is not only a big opportunities for institutions but will also create pressure on cheap information officers. Big data provide complete framework for making IT a more valued asset to the Businesses. Big data implementation projects are at the frontier of the business where many of the most significant business expansion or cost reduction opportunities lie. Taking a lead in Big Data implementation provides a strategically high competitive importance to the business by data management and IT infrastructure strategy which require out-of-the-box thinking as well as moving outside the traditional IT comfort zone.
https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-dataretrieve on 12/06/2019 at 10: 15 Pm
https://www.upgrad.com/blog/what-is-big-data-types-characteristics-benefits-and-examples/ retrieve on 13/06/2019 at 11: 15 Pm
https://www.sciencedirect.com/topics/computer-science/big-data-system retrieve on 13/06/2019 at 12: 15 Pm
https://intellipaat.com/blog/7-big-data-examples-application-of-big-data-in-real-life/ retrieve on 15/06/2019 at 10: 15 Pm
https://www.cisco.com/c/en_in/solutions/data-center-virtualization/big-data/index.html retrieve on 15/06/2019 at 08: 15 Pm