Implementation of K Nearest Neighbors in python

Introduction

In this blog, we would discuss the Implementation of K Nearest Neighbors in python. A non-parametric technique for classification and regression is K-nearest neighbors (KNN). The k closest training instances in the feature space make up the input in both scenarios. Whether k-NN is applied for classification or regression determines the results: The result of k-NN classification is a class membership. A majority of an object’s neighbors must agree on its classification before it can be placed in one of its k closest classes (k is a positive integer, typically small). The object is simply put into the class of its one nearest neighbor if k = 1. The output of k-NN regression is the object’s property value. The average of this value’s k closest neighbors’ values makes up this value.

 

 

KNN is a powerful tool for both classification and regression, but it is important to remember that it is a non-parametric method, which means that it does not make any assumptions about the underlying data. In the case of classification, the algorithm looks at the k nearest neighbors of a new data point and assigns the data point to the class that most of the neighbors belong to. In the case of regression, the algorithm predicts the value of the target variable for a new data point by taking the average of the k nearest neighbors. The algorithm is easy to understand and implement. However, it is important to choose the right value of k. If k is too small, the model will overfit the training data. If k is too large, the model will not be able to learn from the training data.

 

 

 

Implementation of K Nearest Neighbors

Firstly we load the dataset from the sklearn library

from sklearn.datasets import load_iris
iris_dataset = load_iris()

 

Splitting the dataset

 

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
iris_dataset['data'], iris_dataset['target'], random_state=0)

 

Visualization of the dataset

 

import seaborn as sns
import matplotlib.pyplot as plt

# load the dataset 
iris = sns.load_dataset('iris')
 
# visualization
sns.set_style("whitegrid")
sns.FacetGrid(iris, hue ="species",height = 6).map(plt.scatter, 'sepal_length', 'petal_length').add_legend()

 

 

Fitting the dataset to the model

 

# importing the model
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

# fitting the model
knn.fit(X_train, y_train)

 

# predicting the test data
y_pred = knn.predict(X_test)
print("Test set predictions:\n {}".format(y_pred))
print("Predicted target name: {}".format(iris_dataset['target_names'][prediction]))

# getting the score of test data
print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))

 

 

Advantages of K Nearest Neighbors

 

  1. KNN is very versatile and can be used for a variety of tasks.
  2. KNN is easy to implement and understand.
  3. KNN is relatively resistant to overfitting.

 

 

Disadvantages of K Nearest Neighbors

 

  1. KNN can be computationally expensive, especially when working with large datasets.
  2. KNN can be sensitive to outliers and noise in the data.
  3. KNN may have difficulty with high-dimensional data (i.e. data with many features).

 

 

Also, read – Implementation of Decision Trees in Python

 

Share this post

One thought on “Implementation of K Nearest Neighbors in python

Leave a Reply

Your email address will not be published. Required fields are marked *