The world is full of gadgets that produce ever increasing amounts of data. Phones, social networking, medical imaging advances, video all these and more make new data, and systems must be developed to deal with it. Gadgets and sensors in particular are producing unprecedented levels of new explanatory information which if processed and analyzed will produce valuable insight. The field of data science is “rising at the crossing point of the fields of social science and statistics, information and computer science and design”.
Around 100 hours of video are uploaded to YouTube each minute, and it would take around 15 years to watch every video uploaded in a single day. AT&T is thought to “hold the world's largest volume of data in one remarkable database – its phone records database is 312 terabytes in size and contains pretty much 2 trillion lines”. Consistently we send “204,000,000 messages, create 1,800,000 Facebook likes, send 278,000 Tweets, and up-load 200,000 photographs to Facebook, and 570 new sites spring into reality each minute”.
Data science is the “investigation of where data originates from, what it speaks to and how it tends to be transformed into a significant asset in the production of business and IT systems”. A key part of information science is the use of the logical strategy to shape and move theories to approve decisions about fundamental examples in information.
The general goal of data science may appear to be direct; however, execution is an intricate procedure and includes various steps:
- Business Understanding
- Data Understanding
- Data Preparation
- DS team evaluation
- Stakeholder evaluation
The data and analytics landscape is evolving. The developing business around big data and data science is one effect of this advancement/transformation. In spite of the fact that the market frequently utilizes the terms big data and data science reciprocally, they are actually very unique. Big data alludes to the capacity to oversee huge volumes of divergent information at the correct speed and inside the perfect time span to empower examination and activity. Big Data is about the three v's for volume, variety, and velocity — and some would include veracity. Associations are pushing toward progressively cross breed conditions to deal with this enormous and multistructured information. These regularly incorporate the cloud, Hadoop, and information lakes just as NoSQL databases. Big data examination habitually requires the utilization of MPP (massively parallel processing engines), in-memory processing, and different advancements that can deal with huge amounts of information.
- DATA SCIENCE LANDSCAPE
Data science landscape can be divided into the following categories: (i) Fields (ii) Objects (iii) Techniques and (iv) Approaches.
- Data Science Fields
Data Science has different fields such as “Nanotechnologies, Physics, Robotics, Mathematics, Statistics, Information theory, Information technology and Artificial Intelligence”.
- Data Science Objects
Methods that “scale to Big Data are of particular interest in data science, although the discipline is not generally considered to be restricted to such data”.
- Data Science Techniques
Data science techniques may be in the form of “Signal processing, Probability models, Machine learning, Statistical learning, Data mining, Database, Data engineering, Pattern recognition, Visualization, Predictive analytics, Uncertainty modeling, Data warehousing, Data compression, Computer programming and High Performance Computing”.
- Data Science Approaches
The development of “machine learning, a branch of artificial intelligence used to uncover patterns in data from which predictive models can be developed, has enhanced the growth and importance of data science”.
- Data Science Fields
- BIG DATA LANDSCAPE
In order to design big data architecture it is essential to get a handle on the information in the current large information scene. In conventional information collection and storage, organizations processed useful information into structured and labeled databases. At this point various tried and tested techniques were utilized to interrogate the data. However with the immergence of big data, traditional structures could not cope with the “velocity, volume, value and variety of requirements”. To deal with these enormous informational indexes, new designs have been framed that fuse multi hub equal handling strategies. Big data landscape has a further characterization dependent on preparing prerequisites and various techniques are proposed for group handling and continuous preparing.
A few innovations through which we can outfit big data are:
- Massively Parallel Processing
- Massively Parallel Processing
The data is circulated among various hubs for quicker handling, with each hub responsible for executing one part of the programme. Each processing hub has its own operating system and memory.
Mapreduce also uses the concept of “multi nodes and parallel processing”. It consists of two functions:
- Map - It isolates data over various hubs which are then prepared in parallel.
- Reduce - This capacity consolidates the outcome sets into a last reaction.
Massively parallel processing uses “SQL queries however MapReduce uses java and doesn't require exorbitant submitted stages”.
NoSQL database-management systems are “unlike relational database-management systems, in that they don't utilize SQL as their query language”. They forgo the overhead of “indexing, schema and ACID transactional properties to make enormous, reproduced information stores for running investigation on low-cost equipment, which is helpful for managing unstructured information”.
- Hadoop Ecosystem
Hadoop is an “open-source software framework used for storing and processing Big Data in a distributed manner on large clusters of commodity hardware”.
- BIG DATA AS THE NEW FRONTIER OF DATA SCIENCE
Starting from phenomena such data deluge, the existence of new and alternatives data sources like the Internet, sensors and images, the availability of data not ad-hoc collected but automatically generated, it is understood that relations between scientific fields could not be confined to a binary interdisciplinary relationships, but it needed a triangulation and a transdisciplinary approach, and the identification of a data-driven scientific method.
Data science, big data, and advanced analytics have been “progressively perceived as significant main impetuses for cutting edge advancement, economy, and instruction”. Even though big data is at an early stage of development, vital conversations about the “master plan, patterns, significant difficulties, future headings, and possibilities are required for the proper advancement of the field and the network”. The up and coming age of information science encompasses a wide scope, business creation, scientific development, and the economy as a whole. Most assuredly, questions such as, "for what reason do we need information science" will be supplanted by a group of logical speculations and devices to address the obvious difficulties and issues confronting tomorrow's big data users. We will be astonished by the astounding advancements and potential changes that will occur in the next 50 years.
If you found this Article interesting, why not review the other Articles in our archive.