- Explain linear discriminant analysis from very basics
- Try to illustrate it further through hands-on practice

**What you are already supposed to know:**

**Props needed:**

- A notebook and a pen

Linear discriminant analysis is a
classification algorithm which uses Bayes’ theorem to calculate the probability
of a particular observation to fall into a labeled class. It has an advantage
over

**logistic regression**as it can be used in multi-class classification problems and is relatively stable when the classes are highly separable.
Although you can easily implement
the algorithm of LDA through various software packages like R or Sci-kit learn
(Python) but it is equally important to know what goes actually in the
background when you use these software tools. To understand the logic and
mathematics behind it, what I personally believe is that hands-on practice proves
always effective. This article will show you different calculations and derivations involved in LDA and it is recommended for you to use pen and paper to replicate and recalculate everything.

Before we proceed further we need
to understand the assumptions behind LDA, which are as mentioned below:

- The distribution of a variable in every class is normal (gaussian distribution)
- The variance of a variable in every class is equal

This may sound a
little over the head but I will try to make it more simple and clear throughout
the article. LDA was meant to solve the multi-class classification problems but
here we will consider a two-class classification problem with a single predictor
variable for simplicity. Consider a very simple example of predicting the
gender of a person through his/her height by the data as shown below:

The above data set will be used to develop a model through LDA and before doing that we will check it for the assumptions
of LDA to see if this particular algorithm can be implemented. We have to first
make a frequency table out of it with height column represented in class intervals
like as shown below:

You see, the frequency of every
class interval is written against it for both the classes and we will plot the
above table to check how well the first assumption is met by it.

The above graph shows the normal
distribution of height variable for both the classes, hence the first
assumption is satisfied.

Now let’s calculate the mean and
variance for the two classes.

The Variance of the
variable under consideration is almost equal in both the classes and here with it
is met the second assumption of LDA.

Let’s now directly jump to the linear discriminant analysis where our main focus will be to train a model from
the above data, so that we can predict the gender of some other person given
his/her height and whose information is not present in the above table. In
other words you should be able to answer the question of what will be gender of
a person whose height is say 152 cm.

LDA relies heavily on Bayes’
Theorem which, as I said is a pre requisite to understand this article. The
Bayes’ Theorem states that:

I will try to explain
it a bit. P(A1|B) is read as Probability of A1 given B. It means the
probability of event A1, when event B has already occurred e.g. Probability of
Rainfall when humidity is above 80% can be written as P(Rainfall | Humidity
> 80%). P(B|A1) will be the above situation flipped i.e. probability of high
humidity when rainfall has already occurred. P(A1) is called the prior probability,
in this case the probability of rainfall. An important point to note down is
that if A1 represents occurrence of rainfall, A2 will be the event of no
rainfall, a two-class problem and all the other terms will get the usual
meaning.

With that said about Bayes’
theorem and assuming that you have a prior knowledge about the same, let’s focus again
on LDA. For the data table given to us we need to check the probability of a
height value falling in two classes of gender. Which means we will have to
calculate:

P (gender = male | height = 152)
and P (gender = female | height = 152) and then check which probability value
is higher. We will first calculate the probability of male class, which as per
Bayes’ theorem is equal to:

Let’s calculate the terms in the right-hand
side of the equation one by one:

P(gender = male) can be easily calculated
as the number of elements in the male class in the training data set divided by
total number of elements i.e. 5/11 = 0.454. Also P(gender = female) will be
6/11 = 0.545.

Now we have to calculate the
conditional probability terms, which will be found out through the first
assumption of LDA, yes, the distribution of a variable in each class is normal.

We know the equation of a normal
curve is:

Let’s put the above values
in the gaussian equation for both the classes:

Plugging the above derived values in the (eq. 2) we have:

Now as it is evident
that the P(gender = male| height = 152) is less than that of P(gender = female
|height = 152), we can classify the height of 152 cm in the female class.

This is how linear discriminant analysis works. To show you a little general view, we will plug
the distribution equations in the base equation (eq. 2) to see the model that
is actually trained in this algorithm:

The same type of
equation can be used to find P(gender = female | height = x) for any value of
height.

**Steps in LDA model training:**

- Calculate the mean of variable for each class.
- Calculate the variance of the variable for each class.
- Calculate the probability of each class (prior probability).
- Use the values of mean, variance and prior probability to develop final model by assuming normal distribution of the variable in each class.

**Linear for a reason**

Let’s do a few more
calculations to prove an another point. What will be the probabilities for
height = 156cm. Substitute the value in above equations and you will find that
the probabilities for both female and male class is almost equal (0.5). It is
that value of height which acts as a threshold. All the height values above
156cm will be classified as male and those below will be classified as female.
A graphical representation is shown below:

As is evident from the
above graph, the linear discriminant analysis always draw a straight or linear separation
boundary.

**Code:**
Below is the Python code
for implementing whatever we have done so far:

Thanks

Please post your comments/suggestions

Have a good time :)

## Comments

## Post a comment