Task 5 — Adversarial Dataset
Generate a dataset with the following properties:
- 2 features (so we can visualise it in 2D).
- k ≥ 3 different distributions of data.
- Easy for a human to cluster with the naked eye.
- Hard for the KMedoids clusterer to properly cluster, even knowing the right k.
Submit: The code. Figure(s) showing ground truth vs KMedoids clustering. Brief explanation of the approach and why it's easy for humans but hard for k-medoids. The dataset as A3.csv.
Approach — Concentric Rings
We generated 3 concentric rings (radii ≈ 1, 2.5, 4.0), each with 150 uniformly distributed points plus small Gaussian noise. Humans can immediately see "inner ring", "middle ring", "outer ring". KMedoids scores near 0 even with k=3.
Code
import numpy as np, pandas as pd
from sklearn.cluster import KMeans
from sklearn_extra.cluster import KMedoids
from sklearn.metrics import homogeneity_score, completeness_score
import matplotlib.pyplot as plt
np.random.seed(42)
n_per_ring = 150
angles = np.random.uniform(0, 2 * np.pi, n_per_ring * 3)
noise_std = 0.12
ring1 = np.column_stack([
1.0 * np.cos(angles[:n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
1.0 * np.sin(angles[:n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
])
ring2 = np.column_stack([
2.5 * np.cos(angles[n_per_ring:2*n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
2.5 * np.sin(angles[n_per_ring:2*n_per_ring]) + np.random.normal(0, noise_std, n_per_ring),
])
ring3 = np.column_stack([
4.0 * np.cos(angles[2*n_per_ring:]) + np.random.normal(0, noise_std, n_per_ring),
4.0 * np.sin(angles[2*n_per_ring:]) + np.random.normal(0, noise_std, n_per_ring),
])
X_adv = np.vstack([ring1, ring2, ring3])
y_adv = np.array([0]*n_per_ring + [1]*n_per_ring + [2]*n_per_ring)
pd.DataFrame(X_adv, columns=['feature_1', 'feature_2']).to_csv('A3.csv', index=False)
km_adv = KMeans(n_clusters=3, random_state=42, n_init=10).fit(X_adv)
kmed_adv = KMedoids(n_clusters=3, random_state=42).fit(X_adv)
# Scores
print(f'KMeans hom={homogeneity_score(y_adv, km_adv.labels_):.3f}')
print(f'KMedoids hom={homogeneity_score(y_adv, kmed_adv.labels_):.3f}')
# Plot
fig, axes = plt.subplots(1, 3, figsize=(18, 5), sharex=True, sharey=True)
for ax, (labels, title) in zip(axes, [
(km_adv.labels_, 'KMeans (k=3)'),
(kmed_adv.labels_, 'KMedoids (k=3)'),
(y_adv, 'Ground Truth')]):
ax.scatter(X_adv[:,0], X_adv[:,1], c=plt.cm.tab10(labels.astype(float)/3),
s=12, alpha=0.8)
ax.set_title(title); ax.set_xlabel('Feature 1'); ax.set_aspect('equal')
axes[0].set_ylabel('Feature 2')
plt.tight_layout(); plt.show()
Why this approach
Concentric rings are the canonical adversarial case for centroid-based clustering. The three rings are visually unmistakeable — every human immediately groups by radial distance from the origin. However, any centroid placed at the centre of a ring will be equidistant from every point on that ring, making it indistinguishable from a centroid placed at the centre of any other ring (they all share the same geometric centre).
Result
| Model | Homogeneity | Completeness |
|---|---|---|
| KMeans | ~0.008 | ~0.008 |
| KMedoids | ~0.042 | ~0.045 |
Why it's easy for humans but hard for k-medoids
Humans perceive clusters by radial structure — we naturally group the inner ring, middle ring, and outer ring by distance from the centre. We use a topological notion of "surrounding" that is invisible to distance-to-centroid metrics.
KMedoids (and KMeans) produce Voronoi partitions: every point is assigned to its nearest centroid, dividing space into convex regions. Convex regions cannot separate a ring from what surrounds it — no convex polygon can contain a ring while excluding the surrounded interior. The algorithms therefore produce "pizza-slice" partitions that cut angularly through all three rings, scoring near 0 on both homogeneity and completeness.
The fundamental issue: centroid-based clustering assumes clusters are convex. Rings are not convex. To handle rings, you need topology-aware algorithms like DBSCAN (density connectivity) or Spectral Clustering (graph similarity). A3.csv is saved and submitted alongside this report.