Implementation of SVM in Python
Introduction
In this blog, we would discuss the Implementation of SVM in Python. SVM, or Support Vector Machine, is a powerful machine learning algorithm that can be used for both classification and regression tasks. The key idea behind SVM is to find a hyperplane that maximally separates the data points of one class from the data points of the other class. In the case of binary classification, the data points are separated by a line (or hyperplane). SVM can also be used for multi-class classification, where the data points are separated by a plane.
What is SVM?
SVM works by finding a hyperplane that best separates our data points into classes. This hyperplane is also known as a decision boundary. To find this decision boundary, SVM optimizes a cost function that is sensitive to outliers. This cost function is called the hinge loss function. SVM is a powerful tool because it can find decision boundaries in high-dimensional space. This is important because many datasets are not linearly separable. By using the kernel trick, SVM can find decision boundaries in non-linear space.
There are a few things to keep in mind when using SVM for classification. First, SVM is sensitive to the scale of the data. This means that it is important to scale your data before using SVM. Second, SVM requires a lot of memory and can be slow to train. Finally, SVM is a binary classifier, meaning it can only classify data into two classes. Despite these limitations, SVM is a powerful tool that can be used to achieve high accuracy on many classification tasks.
Implementation of SVM
Firstly, we import all the required libraries.
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd
Then we load the dataset, manipulate the dataset and divide it into X and Y labels. you can download the dataset at Dataset
# Importing the datasets and preprocessing datasets = pd.read_csv('Social_Network_Ads.csv') X = datasets.iloc[:, [2,3]].values Y = datasets.iloc[:, 4].values
We then split the dataset into training and testing sets.
# Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size = 0.25, random_state = 0)
# Feature Scaling from sklearn.preprocessing import StandardScaler sc_X = StandardScaler() X_Train = sc_X.fit_transform(X_Train) X_Test = sc_X.transform(X_Test)
We import the inbuilt SVM from sklearn and fit the dataset to the model.
# Fitting the classifier into the Training set from sklearn.svm import SVC classifier = SVC(kernel = 'linear', random_state = 0) classifier.fit(X_Train, Y_Train)
# Predicting the test set results Y_Pred = classifier.predict(X_Test)
We use the Confusion Matrix as the Evaluation Metrics.
# Making the Confusion Matrix from sklearn.metrics import confusion_matrix cm = confusion_matrix(Y_Test, Y_Pred)
# Visualising the Training set results from matplotlib.colors import ListedColormap X_Set, Y_Set = X_Train, Y_Train X1, X2 = np.meshgrid(np.arange(start = X_Set[:, 0].min() - 1, stop = X_Set[:, 0].max() + 1, step = 0.01), np.arange(start = X_Set[:, 1].min() - 1, stop = X_Set[:, 1].max() + 1, step = 0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in enumerate(np.unique(Y_Set)): plt.scatter(X_Set[Y_Set == j, 0], X_Set[Y_Set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j) plt.title('Support Vector Machine (Training set)') plt.xlabel('Age') plt.ylabel('Estimated Salary') plt.legend() plt.show()
# Visualising the Test set results from matplotlib.colors import ListedColormap X_Set, Y_Set = X_Test, Y_Test X1, X2 = np.meshgrid(np.arange(start = X_Set[:, 0].min() - 1, stop = X_Set[:, 0].max() + 1, step = 0.01), np.arange(start = X_Set[:, 1].min() - 1, stop = X_Set[:, 1].max() + 1, step = 0.01)) plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in enumerate(np.unique(Y_Set)): plt.scatter(X_Set[Y_Set == j, 0], X_Set[Y_Set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j) plt.title('Support Vector Machine (Test set)') plt.xlabel('Age') plt.ylabel('Estimated Salary') plt.legend() plt.show()
Also, read – Implementation of K Nearest Neighbors in python.
Pingback: What is GroupBy Function and Examples - Study Experts