Hadoop is an open-source software package for storing and processing large amounts of #data in small hardware clusters. When it comes to the main components of Hadoop, we recognise two components, which are Data Storage and Management and Processing and Computation. (DataFlair, 2019)... Read More
CATEGORY: Data Science
Hey Data Scientists, are you fitting your models satisfactorily through an API, call the plumber!
As a data scientist, it is often expected that you will create a self-service product or web application that can be implemented in next to real time. If you are an established product manager then linking your product to statistics becomes imperative to gain key information. API, Application Process Interface gives you that power. Read More
Solving relationship issues in data science: Top regression techniques
When we hear about regression in data science, the two techniques that come to mind are linear and logistic regression and many professionals end up learning those techniques well. Having said that, there are many types of regression that can be performed, in this article we will go through some of the key regression techniques. Read More
Ways to Improve the Accuracy of Machine Learning Models
The accuracy of machine learning models is perhaps the most important element of AI models. The more effective and reliable the model the more usable it is. Any improvements in accuracy will make all the difference in a competitive world. Read More
Distance Metrics in Machine Learning
Everywhere is within walking distance if you have the time, said Steve Wright, an American comedian, writer and actor. This is true, but who has the time these days? As a result, distance becomes an important metric. Read More
Get Free Spark NLP for Healthcare Licenses to Fight COVID-19
John Snow Labs, the winner of the Data Science Technology award in the International Data Science Awards 2019, is making all its licensed software – Spark NLP for Healthcare, the Healthcare AI Platform, and the Curated Datasets – available for free to data scientists who are actively tackling COVID-19. Read More
CatBoost , The Spirit of a True Racer
CatBoost is a Machine learning library which is used primarily in the classification of categorical data. The more we work with datasets, the more subsets of data we discover. CatBoost enables you to create and build models without having to encode the data to one hot array. The library can be further extended to other ML libraries like Keras and Tensorflow. Read More
Big Data Testing Challenges
Enterprise data has grown 650% in the last five years, as a result about 85% of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage. Data is the lifeline of an organization and is becoming more important each day. Big Data Testing is a trending topic in the Software Industry, its various properties like volume, velocity, variety, variability, value, complexity and performance creates many challenges. Read More
LightGBM – Fast and Furious
Everyday data scientists and machine learning experts try to improve algorithms that will improve accuracy and provide better results, some succeed and some fail. In this article we will discuss one of the most successful machine learning algorithms called Light GBM Read More
XGBoost an efficient implementation of gradient boosting
The "XGBoost" algorithm has been triumphant in many Machine Learning competitions. Introduced in 2014 and since then it’s been commended ever since. Tianqi Chen worked on a research project to develop the XGBoost algorithm and he presented a paper with Carlos Guestrin at SIGKDD Conference in 2016 which caught attention of many Machine learning experts by increasing the accuracy by a significant margin Read More