What is K-Nearest Neighbours?

K-Nearest Neighbours (KNN) is a machine-learning algorithm for classification problems. It's an example of a supervised-learning algorithm. Classification problems are where we are trying to identify which group a data-point belongs to, which requires some existing data to train our model against (hence, supervised learning).

As an example: dog breeds come in many shapes and sizes. Within each breed however, the variation is generally smaller. By looking at the height and weight of a dog, can we work out which breed it most likely is?

Look at the three red crosses - could you tell which breed each of those belongs to? How about the blue cross?

How does KNN work?

The KNN algorithm works by finding the points closest to the location of interest and looking at what groups those points belong to. The number of neighbour points used can be varied, depending on whatever works best for our application.

If we were using K=3, we would look at the three closest neighbours to the point of interest. If they were all the same group, then the point of interest would be assigned to that group. If 2/3 neighbours were Group B and the 3rd point was Group A, then the current point would be assigned Group B, since the 2 neighbours outweighs the 1 other.

In the event of a tie between the count of groups, we can then get more sophisticated by weighting the values closest to the point more heavily etc, but let's not go too far down the rabbit-hole...

Right - enough chat, let's get to work! Move the slider to adjust the value of K and see the effect it has on the classification for each square on the grid. Using the same example and data as above, let's see how KNN would classify every point on the grid for different values of K.

I wrote my own basic KNN algorithm for this - it was great fun!

Note how as K increases, the borders generally become smoother as the classifier becomes more robust. However, this method could be skewed by uneven sample sizes for each group. Just one more fun consideration we need to take into account before doing any actual work.

Note: one of the really nice things about the KNN algorithm is how well it translates to multi-dimensional problems. The example given is looking at only two dimensions, but the method and the maths works in the same way for higher order multi-dimensional problems.