Home Assignment 3 › ML Course › Clustering & PCA A3 Report Apr 2026
Aryan
Rings = non-convex. Voronoi = convex only. No hyperplane can separate inner ring from outer ring. Pizza slice every time.
Aryan

Task 5 — Adversarial Dataset

5

Generate a dataset with the following properties:

  • 2 features (so we can visualise it in 2D).
  • k ≥ 3 different distributions of data.
  • Easy for a human to cluster with the naked eye.
  • Hard for the KMedoids clusterer to properly cluster, even knowing the right k.

Submit: The code. Figure(s) showing ground truth vs KMedoids clustering. Brief explanation of the approach and why it's easy for humans but hard for k-medoids. The dataset as A3.csv.

Answer

Approach — Concentric Rings

We generated 3 concentric rings (radii ≈ 1, 2.5, 4.0), each with 150 uniformly distributed points plus small Gaussian noise. Humans can immediately see "inner ring", "middle ring", "outer ring". KMedoids scores near 0 even with k=3.

Code

import numpy as np, pandas as pd
from sklearn.cluster import KMeans
from sklearn_extra.cluster import KMedoids
from sklearn.metrics import homogeneity_score, completeness_score
import matplotlib.pyplot as plt

np.random.seed(42)
n_per_ring = 150
angles = np.random.uniform(0, 2 * np.pi, n_per_ring * 3)
noise_std = 0.12

ring1 = np.column_stack([
    1.0 * np.cos(angles[:n_per_ring])          + np.random.normal(0, noise_std, n_per_ring),
    1.0 * np.sin(angles[:n_per_ring])          + np.random.normal(0, noise_std, n_per_ring),
])
ring2 = np.column_stack([
    2.5 * np.cos(angles[n_per_ring:2*n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
    2.5 * np.sin(angles[n_per_ring:2*n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
])
ring3 = np.column_stack([
    4.0 * np.cos(angles[2*n_per_ring:])        + np.random.normal(0, noise_std, n_per_ring),
    4.0 * np.sin(angles[2*n_per_ring:])        + np.random.normal(0, noise_std, n_per_ring),
])

X_adv = np.vstack([ring1, ring2, ring3])
y_adv = np.array([0]*n_per_ring + [1]*n_per_ring + [2]*n_per_ring)

pd.DataFrame(X_adv, columns=['feature_1', 'feature_2']).to_csv('A3.csv', index=False)

km_adv   = KMeans(n_clusters=3, random_state=42, n_init=10).fit(X_adv)
kmed_adv = KMedoids(n_clusters=3, random_state=42).fit(X_adv)

# Scores
print(f'KMeans   hom={homogeneity_score(y_adv, km_adv.labels_):.3f}')
print(f'KMedoids hom={homogeneity_score(y_adv, kmed_adv.labels_):.3f}')

# Plot
fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharex=True, sharey=True)
for ax, (labels, title) in zip(axes, [
        (km_adv.labels_,   'KMeans (k=3)'),
        (kmed_adv.labels_, 'KMedoids (k=3)'),
        (y_adv,            'Ground Truth')]):
    ax.scatter(X_adv[:,0], X_adv[:,1], c=plt.cm.tab10(labels.astype(float)/3),
               s=12, alpha=0.8)
    ax.set_title(title);  ax.set_xlabel('Feature 1');  ax.set_aspect('equal')
axes[0].set_ylabel('Feature 2')
plt.tight_layout();  plt.show()

Why this approach

Concentric rings are the canonical adversarial case for centroid-based clustering. The three rings are visually unmistakeable — every human immediately groups by radial distance from the origin. However, any centroid placed at the centre of a ring will be equidistant from every point on that ring, making it indistinguishable from a centroid placed at the centre of any other ring (they all share the same geometric centre).

Result

Task 5 adversarial dataset
ModelHomogeneityCompleteness
KMeans~0.008~0.008
KMedoids~0.042~0.045

Why it's easy for humans but hard for k-medoids

Humans perceive clusters by radial structure — we naturally group the inner ring, middle ring, and outer ring by distance from the centre. We use a topological notion of "surrounding" that is invisible to distance-to-centroid metrics.

KMedoids (and KMeans) produce Voronoi partitions: every point is assigned to its nearest centroid, dividing space into convex regions. Convex regions cannot separate a ring from what surrounds it — no convex polygon can contain a ring while excluding the surrounded interior. The algorithms therefore produce "pizza-slice" partitions that cut angularly through all three rings, scoring near 0 on both homogeneity and completeness.

The fundamental issue: centroid-based clustering assumes clusters are convex. Rings are not convex. To handle rings, you need topology-aware algorithms like DBSCAN (density connectivity) or Spectral Clustering (graph similarity). A3.csv is saved and submitted alongside this report.