Often a regression model over fits the data on which it trains. Having a basic understanding of this concept is quite important before jumping into the techniques to remove its negative impact. So, let's become familiar with the practice of regularization. What is regularization in Machine Learning, and how can this help with overfitting? Can we explain this in the layman’s terms?
We can, let us understand it with the example of raising a child. As a parent we determine how much flexibility our children will have during their upbringing. Too much constraint can stifle their character and development. Alternatively, their future can get ruined by too much freedom. Reasonable flexibility is the best solution, which is to give sufficient flexibility with some constraint. Like fulfilling their expectation on things like chocolates and mobile games for their happiness and then adding a few regularizations like sharing chocolates with their sister or spending time doing homework.
Likewise, in machine learning, the number of parameters used in the training model using observed data is greater than the number needed to describe the problem, thus helping to generalize the problem well. Using the regularization procedure, we seek to reduce the complexity of the regression function without necessarily reducing the degree of polynomial function that underlies it. There are a few ways to do this:
- Reduce the number of predictors
- Reduce the importance of variables
Reduce the importance given to the predictive model variables (regularization) - Diminishing the value given to variables moves the model towards the simplest model, i.e. Y = average (Y) that discards all predictors.
An extremely poor generalization of data would be a basic model. The complex model due to overfitting may not perform well in the test data. Regularization helps select the complexity of the chosen model so that the model is easier to predict. Regularization is nothing but applying a penalty term to the objective function and using that penalty term to control the complexity of the model.
Our machine learning algorithm's goal is to learn the patterns of data and ignore the noise in the data set and solve these cases.
If we have model which is too loose or to complex, then it hampers performance. So, let's see how to solve this problem
L1 Regularization / Lasso / L1
In L1 norm / Lasso we try to generate a sparse matrix by assigning weights to zero based on feature selection, which is done by assigning useful features to non-zero weights and insignificant zero weights.
J(w) = 1 N N i=1 (yi − (w0 + wT xi))2 + λ||w||
We penalize the absolute value of the weights in regularization L1.
L2 Regularization or Ridge Regularization
J(w)=1 (yi − (w0 + wT xi))2 + λ||w||2
The regularization term in L2 is the sum of the square of all function weights as shown in the equation above. The regularization of L2 causes weights to below but does not make them zero, and the solution is non-sparse.
Elastic Net was developed to improve on the lasso, the variable selection of which can be to data-dependent and thus unstable.
The solution is to combine ridge regression and lasso penalties to deliver the best of both worlds. Elastic Net aims to mitigate the loss function below.
J(w, λ1, λ2) = ||y − Xw||2 + λ2||w||2 2 + λ1||w||1
All these are useful techniques that can help make the regression models more accurate. Scikit-Learn is a common library for implementing those algorithms. I will write a detailed paper on it with an A to Z code implementation. Hope you like reading the article.
If you found this Article interesting, why not review the other Articles in our archive.