Implementation of K Nearest Neighbors in python
Introduction
In this blog, we would discuss the Implementation of K Nearest Neighbors in python. A non-parametric technique for classification and regression is K-nearest neighbors (KNN). The k closest training instances in the feature space make up the input in both scenarios. Whether k-NN is applied for classification or regression determines the results: The result of k-NN classification is a class membership. A majority of an object’s neighbors must agree on its classification before it can be placed in one of its k closest classes (k is a positive integer, typically small). The object is simply put into the class of its one nearest neighbor if k = 1. The output of k-NN regression is the object’s property value. The average of this value’s k closest neighbors’ values makes up this value.
KNN is a powerful tool for both classification and regression, but it is important to remember that it is a non-parametric method, which means that it does not make any assumptions about the underlying data. In the case of classification, the algorithm looks at the k nearest neighbors of a new data point and assigns the data point to the class that most of the neighbors belong to. In the case of regression, the algorithm predicts the value of the target variable for a new data point by taking the average of the k nearest neighbors. The algorithm is easy to understand and implement. However, it is important to choose the right value of k. If k is too small, the model will overfit the training data. If k is too large, the model will not be able to learn from the training data.
Implementation of K Nearest Neighbors
Firstly we load the dataset from the sklearn library
from sklearn.datasets import load_iris iris_dataset = load_iris()
Splitting the dataset
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( iris_dataset['data'], iris_dataset['target'], random_state=0)
Visualization of the dataset
import seaborn as sns import matplotlib.pyplot as plt # load the dataset iris = sns.load_dataset('iris') # visualization sns.set_style("whitegrid") sns.FacetGrid(iris, hue ="species",height = 6).map(plt.scatter, 'sepal_length', 'petal_length').add_legend()
Fitting the dataset to the model
# importing the model from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=1) # fitting the model knn.fit(X_train, y_train)
# predicting the test data y_pred = knn.predict(X_test) print("Test set predictions:\n {}".format(y_pred)) print("Predicted target name: {}".format(iris_dataset['target_names'][prediction])) # getting the score of test data print("Test set score: {:.2f}".format(knn.score(X_test, y_test)))
Advantages of K Nearest Neighbors
- KNN is very versatile and can be used for a variety of tasks.
- KNN is easy to implement and understand.
- KNN is relatively resistant to overfitting.
Disadvantages of K Nearest Neighbors
- KNN can be computationally expensive, especially when working with large datasets.
- KNN can be sensitive to outliers and noise in the data.
- KNN may have difficulty with high-dimensional data (i.e. data with many features).
Also, read – Implementation of Decision Trees in Python
Pingback: Implementation of SVM in Python - Study Experts