K-Means Clustering — From Zero to Expert

What is K-Means?

K-Means groups similar data points automatically.

It is an unsupervised learning algorithm.

Real Life Examples

  • Customer segmentation
  • Student grouping
  • Image compression
  • Market research

Steps

  1. Pick K centers
  2. Measure distance
  3. Assign cluster
  4. Update centers
  5. Repeat

Core Code

import numpy as np

def kmeans(X, k, max_iterations=100):
    centroids = X[np.random.choice(X.shape[0], k, replace=False)]

    for i in range(max_iterations):
        distances = np.sqrt(((X - centroids[:, None]) ** 2).sum(axis=2))
        labels = np.argmin(distances, axis=0)

        new_centroids = []

        for j in range(k):
            pts = X[labels == j]
            if len(pts)==0:
                new_centroids.append(centroids[j])
            else:
                new_centroids.append(pts.mean(axis=0))

        new_centroids = np.array(new_centroids)

        if np.allclose(centroids, new_centroids):
            break

        centroids = new_centroids

    return centroids, labels

Live Dataset Panel






Run in Live Editor



Mini Project: Customer Segmentation

X = [
[1000,2],
[1200,3],
[8000,12],
[8500,11],
[4000,6],
[4200,7]
]
centroids, labels = kmeans(X,3)

Clusters: Low, Medium, High value customers

Quiz

What is K?

Number of clusters
Iterations

K-Means is?

Unsupervised
Supervised