Linear Discriminant Analysis (LDA) is an effective classification method, and it is simple and easy to understand. It is useful when the number of input features increases to the point where the predictive modeling function of a model becomes too difficult for the model to function. This is more commonly called the curse of dimensionality. The reduction of dimensions refers to techniques that reduce the number of input variables in a dataset which makes the model more robust.
As the name suggests dimensionality reduction methods minimize the number of measurements in a dataset while preserving as much detail as possible.
Ideally, Logistic regression is a classification algorithm traditionally confined to problems of classification of only two classes but If you have more than two groups then Linear Discriminant Analysis is the ideal technique for classification. It is made up of your data's statistical properties, calculated for each class. This is the mean and variance of the variable for a single input variable (x) for each class. This is the same properties calculated over the Gaussian multivariate for multiple variables, namely the means and the matrix of covariance.
LDA predicts by estimating the likelihood of a new set of inputs relating to each class. The output class is the class that gets the highest probability, and a prediction is made.
Linear Discriminant Analysis can be broken down into the below steps:
1. Calculate the matrices within the class and between-class scatter
2. For the scatter matrices, compute the eigenvectors and corresponding eigenvalues
3. Select the top k by sorting eigenvalues
4. Map the k value and create a new matrix of eigenvectors
5. After taking dot product and matrix from the above step, create a new feature which is LDA components
Below are the libraries required to implement LDA in Python:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import LabelEncoder
import seaborn as sns
import pandas as pd
import numpy as np
There are many variations on the original Linear Discriminant Analysis model which we will cover in future posts.
If you found this Article interesting, why not review the other Articles in our archive.