K-Means Clustering — From Zero to Expert
What is K-Means?
K-Means groups similar data points automatically.
It is an unsupervised learning algorithm.
Real Life Examples
- Customer segmentation
- Student grouping
- Image compression
- Market research
Steps
- Pick K centers
- Measure distance
- Assign cluster
- Update centers
- Repeat
Core Code
import numpy as np
def kmeans(X, k, max_iterations=100):
centroids = X[np.random.choice(X.shape[0], k, replace=False)]
for i in range(max_iterations):
distances = np.sqrt(((X - centroids[:, None]) ** 2).sum(axis=2))
labels = np.argmin(distances, axis=0)
new_centroids = []
for j in range(k):
pts = X[labels == j]
if len(pts)==0:
new_centroids.append(centroids[j])
else:
new_centroids.append(pts.mean(axis=0))
new_centroids = np.array(new_centroids)
if np.allclose(centroids, new_centroids):
break
centroids = new_centroids
return centroids, labels
Live Dataset Panel
Mini Project: Customer Segmentation
X = [ [1000,2], [1200,3], [8000,12], [8500,11], [4000,6], [4200,7] ]
centroids, labels = kmeans(X,3)
Clusters: Low, Medium, High value customers
Quiz
What is K?
Number of clusters
Iterations
K-Means is?
Unsupervised
Supervised