Prediction using Logistic Regression in python

Introductions

 

In this blog, we have discussed how to implement the Logistic Regression algorithm to predict whether people buy a product. From the previous blog, we know that logistic regression is used to predict the categorical dependent variable and solve classification problems. From the sklearn library, we use the inbuilt logistic regression function for prediction.

 

Libraries Required

 

  1. Numpy
  2. Pandas
  3. Matplotlib
  4. Seaborn
  5. sklearn

 

Implementation of Logistic regression to predict whether people buy a product.

 

Step-1: Importing Libraries and dataset

 

To start with the Prediction using Logistic regression, the first step is to get the dataset. The dataset includes whether people buy a product based on their age, gender, and estimated annual salary. The dataset can be downloaded at Dataset.

 

# import the required librares

import numpy as np
import pandas as pd 

# import the visualizing libraries

import matplotlib.pyplot as plt
import seaborn as sns

# read the dataset 

df = pd.read_csv('path for your file')

 

Step-2: Preprocessing the data 

 

Firstly we check for any missing values in the data. If there are many missing values we would remove the column.

 

# checking for nullvalues
df.isnull().all()

 

Now, check for any outliers in the data

 

# checking for outliers in age column

fig, axes = plt.subplots(1, 2, figsize=(15, 10))
sns.boxplot(ax=axes[0],x= df['Age'], palette = "Set1")

# setting the title of figure to age of people
axes[0].set_title('Age Of People')


sns.boxplot(ax=axes[1],data = df['EstimatedSalary'])

# setting the title of figure to Estimated Salary Of People
axes[1].set_title('Estimated Salary Of People')

plt.show()

 

In the next step, check for correlation between the column and drop the column with fewer correlation values as it doesn’t contribute to the prediction.

 

df.corr()

f,ax = plt.subplots(figsize=(6, 5))

sns.heatmap(df.corr(), annot=True, linewidths=0.5,linecolor="red", fmt= '.1f',cmap='viridis',ax=ax)
plt.show()

 

we drop the userid column as it doesn’t contribute to prediction.

 

#preparing data

df.drop('User ID',axis = 1, inplace = True)
label = {'Male': 0 ,"Female" : 1}
df['Gender'].replace(label, inplace= True)

 

Step-3: Splitting the dataset.

 

X is set to all the columns except the purchased column and Y is set to the purchased column which we are going to predict.

 

# set inputs and outputs

X = df.drop('Purchased',axis = 1)     
y = df['Purchased']

 

With the help of the StandardScaler, data can be standardized so that the transformed feature has a mean of 0 and a standard deviation of 1.

 

# we import the StandardScaler from sklearn library
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

# we fit the data to standard scaler
d_scaled = scaler.fit_transform(X)

data_scaled1 = pd.DataFrame(d_scaled)
data_scaled1.head()

 

we split our variable into training and testing sets. It is usually a good practice to keep 70% of the data in your train dataset and the rest 30% in your test dataset.

 

# we import train_test_split and linear regression module from sklearn
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(d_scaled,y,test_size=0.30,random_state=4)

 

Step-4: Building the model for Logistic regression and testing it on test data.

 

We use an inbuilt model from the sklearn library to build the Logistic regression model and fit our training data to it.

 

# we import LogisticRegression module from sklearn library
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(C=0.1,max_iter = 500)
model.fit(X_train,y_train)

 

We test the model using the test data that is to predict whether people buy the product

 

# Predicting Values of test data using the above trained model

y_pred = model.predict(X_test)

 

Step-5: Evaluation Metrics

 

Evaluation metrics show how well your model is performing. We use Accuracy, precision, recall, and f1-score.

 

# we import accuracy, precision, recall and f1 score using sklearn library
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

print("Accuracy:", accuracy_score(y_test,y_pred))
print("Precision:", precision_score(y_test, y_pred, ))
print("Recall:", recall_score(y_test,y_pred))
print("F1 Score:", f1_score(y_test,y_pred))

 

Also read: Salary prediction using Lasso and Ridge

Share this post

3 thoughts on “Prediction using Logistic Regression in python

Leave a Reply

Your email address will not be published. Required fields are marked *