photo-content-recognition-curation-expert

by @erichowens in AI & LLM

# Install this skill:

npx skills add erichowens/some_claude_skills --skill "photo-content-recognition-curation-expert"

Install specific skill from multi-skill repository

# Description

Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.

# SKILL.md

name: photo-content-recognition-curation-expert
description: Expert in photo content recognition, intelligent curation, and quality filtering. Specializes in face/animal/place recognition, perceptual hashing for de-duplication, screenshot/meme detection, burst photo selection, and quick indexing strategies. Activate on 'face recognition', 'face clustering', 'perceptual hash', 'near-duplicate', 'burst photo', 'screenshot detection', 'photo curation', 'photo indexing', 'NSFW detection', 'pet recognition', 'DINOHash', 'HDBSCAN faces'. NOT for GPS-based location clustering (use event-detection-temporal-intelligence-expert), color palette extraction (use color-theory-palette-harmony-expert), semantic image-text matching (use clip-aware-embeddings), or video analysis/frame extraction.
allowed-tools: Read,Write,Edit,Bash,Grep,Glob,mcp__firecrawl__firecrawl_search,WebFetch
category: AI & Machine Learning
tags:
- face-recognition
- deduplication
- curation
- indexing
- nsfw
pairs-with:
- skill: event-detection-temporal-intelligence-expert
reason: Temporal context for photos
- skill: wedding-immortalist
reason: Curate wedding photo collections

Photo Content Recognition & Curation Expert

Expert in photo content analysis and intelligent curation. Combines classical computer vision with modern deep learning for comprehensive photo analysis.

When to Use This Skill

✅ Use for:
- Face recognition and clustering (identifying important people)
- Animal/pet detection and clustering
- Near-duplicate detection using perceptual hashing (DINOHash, pHash, dHash)
- Burst photo selection (finding best frame from 10-50 shots)
- Screenshot vs photo classification
- Meme/download filtering
- NSFW content detection
- Quick indexing for large photo libraries (10K+)
- Aesthetic quality scoring (NIMA)

❌ NOT for:
- GPS-based location clustering → event-detection-temporal-intelligence-expert
- Color palette extraction → color-theory-palette-harmony-expert
- Semantic image-text matching → clip-aware-embeddings
- Video analysis or frame extraction

Quick Decision Tree

What do you need to recognize/filter?
│
├─ Duplicate photos? ─────────────────────────────── Perceptual Hashing
│   ├─ Exact duplicates? ──────────────────────────── dHash (fastest)
│   ├─ Brightness/contrast changes? ───────────────── pHash (DCT-based)
│   ├─ Heavy crops/compression? ───────────────────── DINOHash (2025 SOTA)
│   └─ Production system? ─────────────────────────── Hybrid (pHash → DINOHash)
│
├─ People in photos? ─────────────────────────────── Face Clustering
│   ├─ Known thresholds? ──────────────────────────── Apple-style Agglomerative
│   └─ Unknown data distribution? ─────────────────── HDBSCAN
│
├─ Pets/Animals? ─────────────────────────────────── Pet Recognition
│   ├─ Detection? ─────────────────────────────────── YOLOv8
│   └─ Individual clustering? ─────────────────────── CLIP + HDBSCAN
│
├─ Best from burst? ──────────────────────────────── Burst Selection
│   └─ Score: sharpness + face quality + aesthetics
│
└─ Filter junk? ──────────────────────────────────── Content Detection
    ├─ Screenshots? ───────────────────────────────── Multi-signal classifier
    └─ NSFW? ──────────────────────────────────────── Safety classifier

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

Problem: Camera bursts, re-saved images, and minor edits create near-duplicates.

Solution: Perceptual hashes generate similar values for visually similar images.

Method Comparison:

Method	Speed	Robustness	Best For
dHash	Fastest	Low	Exact duplicates
pHash	Fast	Medium	Brightness/contrast changes
DINOHash	Slower	High	Heavy crops, compression
Hybrid	Medium	Very High	Production systems

Hybrid Pipeline (2025 Best Practice):
1. Stage 1: Fast pHash filtering (eliminates obvious non-duplicates)
2. Stage 2: DINOHash refinement (accurate detection)
3. Stage 3: Optional Siamese ViT verification

Hamming Distance Thresholds:
- Conservative: ≤5 bits different = duplicates
- Aggressive: ≤10 bits different = duplicates

→ Deep dive: references/perceptual-hashing.md

2. Face Recognition & Clustering

Goal: Group photos by person without user labeling.

Apple Photos Strategy (2021-2025):
1. Extract face + upper body embeddings (FaceNet, 512-dim)
2. Two-pass agglomerative clustering
3. Conservative first pass (threshold=0.4, high precision)
4. HAC second pass (threshold=0.6, increase recall)
5. Incremental updates for new photos

HDBSCAN Alternative:
- No threshold tuning required
- Robust to noise
- Better for unknown data distributions

Parameters:

Setting	Agglomerative	HDBSCAN
Pass 1 threshold	0.4 (cosine)	-
Pass 2 threshold	0.6 (cosine)	-
Min cluster size	-	3 photos
Metric	cosine	cosine

→ Deep dive: references/face-clustering.md

3. Burst Photo Selection

Problem: Burst mode creates 10-50 nearly identical photos.

Multi-Criteria Scoring:

Criterion	Weight	Measurement
Sharpness	30%	Laplacian variance
Face Quality	35%	Eyes open, smiling, face sharpness
Aesthetics	20%	NIMA score
Position	10%	Middle frames bonus
Exposure	5%	Histogram clipping check

Burst Detection: Photos within 0.5 seconds of each other.

→ Deep dive: references/content-detection.md

4. Screenshot Detection

Multi-Signal Approach:

Signal	Confidence	Description
UI elements	0.85	Status bars, buttons detected
Perfect rectangles	0.75	>5 UI buttons (90° angles)
High text	0.70	>25% text coverage (OCR)
No camera EXIF	0.60	Missing Make/Model/Lens
Device aspect	0.60	Exact phone screen ratio
Perfect sharpness	0.50	>2000 Laplacian variance

Decision: Confidence >0.6 = screenshot

→ Deep dive: references/content-detection.md

5. Quick Indexing Pipeline

Goal: Index 10K+ photos efficiently with caching.

Features Extracted:
- Perceptual hashes (de-duplication)
- Face embeddings (people clustering)
- CLIP embeddings (semantic search)
- Color palettes
- Aesthetic scores

Performance (10K photos, M1 MacBook Pro):

Operation	Time
Perceptual hashing	2 min
CLIP embeddings	3 min (GPU)
Face detection	4 min
Color palettes	1 min
Aesthetic scoring	2 min (GPU)
Clustering + dedup	1 min
Total (first run)	~13 min
Incremental	<1 min

→ Deep dive: references/photo-indexing.md

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

What it looks like:

distance = np.linalg.norm(embedding1 - embedding2)  # WRONG

Why it's wrong: Face embeddings are normalized; cosine similarity is the correct metric.

What to do instead:

from scipy.spatial.distance import cosine
distance = cosine(embedding1, embedding2)  # Correct

Anti-Pattern: Fixed Clustering Thresholds

What it looks like: Using same distance threshold for all face clusters.

Why it's wrong: Different people have varying intra-class variance (twins vs. diverse ages).

What to do instead: Use HDBSCAN for automatic threshold discovery, or two-pass clustering with conservative + relaxed passes.

Anti-Pattern: Raw Pixel Comparison for Duplicates

What it looks like:

is_duplicate = np.allclose(img1, img2)  # WRONG

Why it's wrong: Re-saved JPEGs, crops, brightness changes create pixel differences.

What to do instead: Perceptual hashing (pHash or DINOHash) with Hamming distance.

Anti-Pattern: Sequential Face Detection

What it looks like: Processing faces one photo at a time without batching.

Why it's wrong: GPU underutilization, 10x slower than batched.

What to do instead: Batch process images (batch_size=32) with GPU acceleration.

Anti-Pattern: No Confidence Filtering

What it looks like:

for face in all_detected_faces:
    cluster(face)  # No filtering

Why it's wrong: Low-confidence detections create noise clusters (hands, objects).

What to do instead: Filter by confidence (threshold 0.9 for faces).

Anti-Pattern: Forcing Every Photo into Clusters

What it looks like: Assigning noise points to nearest cluster.

Why it's wrong: Solo appearances shouldn't pollute person clusters.

What to do instead: HDBSCAN/DBSCAN naturally identifies noise (label=-1). Keep noise separate.

Quick Start

from photo_curation import PhotoCurationPipeline

pipeline = PhotoCurationPipeline()

# Index photo library
index = pipeline.index_library('/path/to/photos')

# De-duplicate
duplicates = index.find_duplicates()
print(f"Found {len(duplicates)} duplicate groups")

# Cluster faces
face_clusters = index.cluster_faces()
print(f"Found {len(face_clusters)} people")

# Select best from bursts
best_photos = pipeline.select_best_from_bursts(index)

# Filter screenshots
real_photos = pipeline.filter_screenshots(index)

# Curate for collage
collage_photos = pipeline.curate_for_collage(index, target_count=100)

Python Dependencies

torch transformers facenet-pytorch ultralytics hdbscan opencv-python scipy numpy scikit-learn pillow pytesseract

Integration Points

event-detection-temporal-intelligence-expert: Provides temporal event clustering for event-aware curation
color-theory-palette-harmony-expert: Extracts color palettes for visual diversity
collage-layout-expert: Receives curated photos for assembly
clip-aware-embeddings: Provides CLIP embeddings for semantic search and DeepDBSCAN

References

DINOHash (2025): "Adversarially Fine-Tuned DINOv2 Features for Perceptual Hashing"
Apple Photos (2021): "Recognizing People in Photos Through Private On-Device ML"
HDBSCAN: "Hierarchical Density-Based Spatial Clustering" (2013-2025)
Perceptual Hashing: dHash (Neal Krawetz), DCT-based pHash

Version: 2.0.0
Last Updated: November 2025

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

photo-content-recognition-curation-expert

# Description

# SKILL.md

Photo Content Recognition & Curation Expert

When to Use This Skill

Quick Decision Tree

Core Concepts

1. Perceptual Hashing for Near-Duplicate Detection

2. Face Recognition & Clustering

3. Burst Photo Selection

4. Screenshot Detection

5. Quick Indexing Pipeline

Common Anti-Patterns

Anti-Pattern: Euclidean Distance for Face Embeddings

Anti-Pattern: Fixed Clustering Thresholds

Anti-Pattern: Raw Pixel Comparison for Duplicates

Anti-Pattern: Sequential Face Detection

Anti-Pattern: No Confidence Filtering

Anti-Pattern: Forcing Every Photo into Clusters

Quick Start

Python Dependencies

Integration Points

References

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill