Photo by Janita Sumeiko on Unsplash |

- Discuss the idea behind logistic regression
- Explain it further through an example

What you are already supposed to know

**Idea:**

You are given a problem of predicting the gender of a person
based on his/her height. To start with you are provided the data of 10 people,
whose height and gender are known. You are asked to fit a mathematical model in
this data in a way that will enable you to predict the gender of some other
person whose height value is known but we have no information about his/her
gender. This type of problem comes under the classification domain of
supervised machine learning. If the problem demands from you to make classifications
into various categories like True, False or rich, middle class, poor or
Failure, Success etc then you are dealing with the classification problems. Its
counterpart in machine learning is a regression problem where the problem
demands from us to predict a continuous value like marks = 33.4%, weight = 60 Kg , etc. This article will discuss a classification algorithm called Logistic
regression.

Although there are lot of classification algorithms out
there that vary from each other in degree of complexity like Linear Discriminant
Analysis, Decision Trees, Random Forest etc but Logistic Regression is the most
basic one and is perfect to learn about classification models. Let’s jump to
the above-stated problem and suppose, we are given the data of ten people as
shown below:

This problem is wholly different from other mathematical prediction
problems. The reason being that on one hand we have continuous values of height
but on the other hand, we have categorical values of gender. Our mathematical operations
know how to deal with numbers but dealing with categorical values poses a
challenge. To overcome this challenge in classification problems whether they
are solved through logistic regression or some other algorithm, we always calculate the probability
value associated with a class. In given context, we will calculate the probability
associated with male class or female class. The probability with other class need not be explicitly
calculated but can be obtained by subtracting probability of previously calculated
class from one.

**Logistic Regression**

In the given data set, we have height as independent
variable and gender as dependent variable. For time being, if we assume it to
be a regression problem, it would have been solved by calculating the parameters
of the regression model given as below:

In short, we would have calculated

**Bo**and**B1**and problem solved. The classification problem cannot be solved in this manner. As stated, we cannot calculate the value of gender but the probability associated with a particular gender class. In logistic regression we take inspiration from linear regression and use the linear model above to calculate probability. We just need a function that will take the above linear model as input and give us the probability value as output. In mathematical form, we should have something like this:
The above model calculates the probability of male class but
we can use either of the two classes here. The function showed in the right
side of the equation should satisfy the condition that it should take any real number
input but should give the output in the range of 0 and 1 only, the reason being
obvious. The above condition is satisfied by a function called Sigmoid or
logistic function shown below:

The Sigmoid function has the domain of

*-inf*to*inf*and the range of 0 to 1, which makes it perfect for probability calculation in logistic regression. If we plug the linear model in Sigmoid function we will get something as shown below:
The above equation can be easily rearranged to give us the
more simple & easily understandable equation as shown below:

The right-hand side of the equation is exactly what we have
in linear regression model & the left hand side is the log of probability
of odds, also called logit. So the above equation can be also written as:

*logit(gender=male) = Bo + B1*height*
This is the idea behind the logistic regression. Now let’s solve
the problem given to us to see its application.

**Application**

We will use the Python code to train our model using the
given data. Let’s first import the necessary modules. We need

*numpy*and*LogisticRegression*class from*sklearn*

*import numpy as np*

*from sklearn.linear_model import LogisticRegression*

So now modules are imported we need to create an instance of the LogisticRegression class

*lg = LogisticRegression(solver = 'lbfgs')*

The solver used is

**lbfgs**. It’s now time to create the data set that we will use to train the model.*height = np.array([[132,134,133,139,145,144,165,160,155,140]])*

*gender = np.array([1,0,1,1,0,0,0,0,0,1])*

Note that the sklearn can only handle numerical values, so
here we are representing female class with 1 and male class with 0. Using the
above datasets let’s train the model:

*lg.fit(height.reshape(-1,1),gender.ravel())*

Once the model is trained, you will get a confirmation
message of the same. So no we have a trained model, lets check the parameters,
the intercept (

**Bo**) and the slope(**B1**).*lg.coef_*

*lg.intercept_*

Running the above lines will show you the intercept value of
35.212 and the slope value of -0.252. Hence our trained model can be written
as:

We can use the above equation to predict the gender of any
person given his/her height or we can directly use the trained model as shown
below to find the gender value of a person with height = 140cm:

*lg.predict(np.array([[140]]))*

Give the above lines of code a try, you will get the idea. Note that the model actually gives us the probability value associated with a given class and it's up to us to decide the threshold value of the probability. The default is considered 0.5 i.e. all the probability values associated with male class above 0.5 are considered males & probability of male class if less than 0.5 is considered female. Also the separation boundary in logistic regression is linear which can be easily confirmed graphically.

**Further Read**

That is all for Logistic Regression. For any queries
regarding the article, you can reach me on **LinkedIn**

Thanks,

Have a nice time 😊

## Comments

## Post a comment