Can you explain the difference between a Test Set and a Validation Set?
Validation set can be considered as a part of the training set as it is used for parameter selection and to avoid Overfitting of the model being built. On the other hand, test set is used for testing or evaluating the performance of a trained machine leaning model.In simple terms ,the differences can be summarized as-Training Set is to fit the parameters i.e. weights.Test Set is to assess the performance of the model i.e. evaluating the predictive power and generalization.Validation set is to tune the parameters.
How can you deal with different types of seasonality in time series modelling?
Seasonality in time series occurs when time series shows a repeated pattern over time. E.g., stationary sales decreases during holiday season, air conditioner sales increases during the summers etc. are few examples of seasonality in a time series.Seasonality makes your time series non-stationary because average value of the variables at different time periods. Differentiating a time series is generally known as the best method of removing seasonality from a time series. Seasonal differencing can be defined as a numerical difference between a particular value and a value with a periodic lag (i.e. 12, if monthly seasonality is present)
Is it possible to perform logistic regression with Microsoft Excel?
It is possible to perform logistic regression with Microsoft Excel. There are two ways to do it using Excel.a) One is to use Add-ins provided by many websites which we can use.b) Second is to use fundamentals of logistic regression and use Excel’s computational power to build a logistic regression
What is the goal of A/B Testing?
It is a statistical hypothesis testing for randomized experiment with two variables A and B. The goal of A/B Testing is to identify any changes to the web page to maximize or increase the outcome of an interest. An example for this could be identifying the click through rate for a banner ad.
How can you assess a good logistic model?
There are various methods to assess the results of a logistic regression analysis-• Using Classification Matrix to look at the true negatives and false positives.• Concordance that helps identify the ability of the logistic model to differentiate between the event happening and not happening.• Lift helps assess the logistic model by comparing it with random selection.
Can you use machine learning for time series analysis?
Yes, it can be used but it depends on the applications.
What is Interpolation and Extrapolation?
Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.
Why data cleaning plays a vital role in analysis?
Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because - as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.
Which technique is used to predict categorical responses?
Classification technique is used widely in mining for classifying data sets.
Python or R – Which one would you prefer for text analytics?
The best possible answer for this would be Python because it has Pandas library that provides easy to use data structures and high performance data analysis tools.