Sales Prediction using Linear Regression in Python

365 views 12/08/202218/08/2022 Abhishek Sai 1 Comment creating model, data preprocessig, Linear regression, Machine Learning, python, Sales prediction, Sales Prediction using Linear Regression in Python, sklearn, statsmodels

Introduction

In this blog, we have discussed how to implement the linear regression algorithm to predict sales. From the previous blog, we know that “linear regression” finds the linear relationship between the dependent and independent variables by determining the best fit linear line between them. From the statsmodel library and sklearn library we use the inbuilt linear regression function for prediction.

Libraries Required

Numpy
Pandas
Matplotlib
Seaborn
sklearn
statsmodels

Implementation of Sales Prediction using Linear Regression

Step-1: Importing Libraries and dataset

Firstly, we need to import the required libraries and datasets. Dataset can be downloaded at Advertising.csv.

# Import the numpy and pandas package

import numpy as np 
import pandas as pd

# Data Visualisation 

import matplotlib.pyplot as plt 
import seaborn as sns 

advertising = pd.DataFrame(pd.read_csv(" path for the dataset"))

# displays the first 5 rows of the dataset 

advertising.head()

Step-2: Preprocessing the data

In this step, we check for any missing values and outliers and deal with them.

# Checking Null values

advertising.isnull().sum()*100/advertising.shape[0]

# Outlier Analysis

fig, axs = plt.subplots(3, figsize = (5,5))
plt1 = sns.boxplot(advertising['TV'], ax = axs[0])
plt2 = sns.boxplot(advertising['Newspaper'], ax = axs[1])
plt3 = sns.boxplot(advertising['Radio'], ax = axs[2])
plt.tight_layout()

Step-3: Exploratory Data Analysis

In this step, we explore the data. Exploratory data analysis (EDA) is a method of data analysis that allows for a precise understanding of the data. Basically, it means being aware of the contents of the data we’re working with.

# with this inbuilt function we can get mean, median, mode etc for the given dataset.

advertising.describe()

# check for any outliers in target variable

sns.boxplot(advertising['Sales'])
plt.show()

Step-4: Splitting the dataset

The feature variable TV is first assigned to variable X, and the response variable Sales is assigned to variable Y and split our variable into training and testing sets.

# assigning the variables

X = advertising['TV']
y = advertising['Sales']

# we split our variable into training and testing sets. It is usually a good practice to keep 70% of the data in your train dataset and the rest 30% in your test dataset

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.7, test_size = 0.3, random_state = 100)

Step-5: Building the model

Using statsmodel.api library

First import statsmodel library. But in order to have an intercept, you need to manually use the add constant attribute of statsmodel. And once you’ve added the constant to your X_Train dataset, you can go ahead and fit a regression line using the OLS.

# importing the library

import statsmodels.api as sm

# Add a constant to get an intercept
X_train_sm = sm.add_constant(X_train)

# Fit the resgression line using 'OLS'
lr = sm.OLS(y_train, X_train_sm).fit()

# Performing a summary operation lists out all the different parameters of the regression line fitted
print(lr.summary())

Using sklearn Library

We use the sklearn library to perform Linear regression. First, we import the Linear Regression module from sklearn and fit the data.

from sklearn import linear_model

  
# create linear regression object
reg = linear_model.LinearRegression()
  
X_train= X_train.values.reshape(-1, 1)
y_train= y_train.values.reshape(-1, 1)

# train the model using the training sets
reg.fit(X_train, y_train)

Let’s visualize how well the model fits the data. From the parameters that we get, our linear regression equation becomes Sales=6.948+0.054×TV.

plt.scatter(X_train, y_train)
plt.plot(X_train, 6.948 + 0.054*X_train, 'r')
plt.show()

Step6: Predictions on the Test Set

Making predictions on the test data is the next step after fitting a regression line to your train dataset. To do this, you must first add a constant to the X test data, just as you did for the X train data, and then you can proceed to predict the y values corresponding to the X test using the predicted attribute of the fitted regression line.

# Add a constant to X_test
X_test_sm = sm.add_constant(X_test)

# Predict the y values corresponding to X_test_sm
y_pred = lr.predict(X_test_sm)

Let us visualize the fit on the test set

Step7: Evaluation Metrics

Evaluation metrics show how well your model is performing. We use MAE, MSE, RMSE, and R squared error as our evaluation metrics.

from sklearn.metrics import mean_absolute_error,mean_squared_error
 
mae = mean_absolute_error(y_true=y_test,y_pred=y_pred)
mse = mean_squared_error(y_true=y_test,y_pred=y_pred) #default=True
rmse = mean_squared_error(y_true=y_test,y_pred=y_pred,squared=False)
r_squared = r2_score(y_test, y_pred)
 
print("MAE:",mae)
print("MSE:",mse)
print("RMSE:",rmse)
print("r_squared",r_squared)

Also, Read – Linear Regression

Share this post

Sales Prediction using Linear Regression in Python

Introduction

Libraries Required

Implementation of Sales Prediction using Linear Regression

Step-1: Importing Libraries and dataset

Step-2: Preprocessing the data

Step-3: Exploratory Data Analysis

Step-4: Splitting the dataset

Step-5: Building the model

Using statsmodel.api library

Using sklearn Library

Step6: Predictions on the Test Set

Step7: Evaluation Metrics

One thought on “Sales Prediction using Linear Regression in Python”

Leave a Reply Cancel reply

Introduction

Libraries Required

Implementation of Sales Prediction using Linear Regression

Step-1: Importing Libraries and dataset

Step-2: Preprocessing the data

Step-3: Exploratory Data Analysis

Step-4: Splitting the dataset

Step-5: Building the model

Using statsmodel.api library

Using sklearn Library

Step6: Predictions on the Test Set

Step7: Evaluation Metrics

Share this:

You May Also Like

One thought on “Sales Prediction using Linear Regression in Python”

Leave a Reply Cancel reply