contributor-codebase-analyzer

by @anivar in AI & LLM

# Install this skill:

npx skills add anivar/contributor-codebase-analyzer

Or install specific skill: npx add-skill https://github.com/anivar/contributor-codebase-analyzer

# Description

# SKILL.md

name: contributor-codebase-analyzer
description: >
Deep-dive code analysis with periodic saving. Contributor mode reads every
commit diff for annual reviews, accuracy rates, and promotion readiness.
Codebase mode maps repository structure, cross-repo relationships, and
enterprise governance. Works with GitHub (gh) and GitLab (glab). Saves
checkpoints incrementally for resume across sessions.
license: MIT
user-invocable: true
agentic: true
compatibility: "Requires git and either gh (GitHub CLI) or glab (GitLab CLI). Optional: jq, bc."
metadata:
author: anivar
version: 3.0.0
tags: contributor-review, codebase-analysis, enterprise-governance, periodic-saving, github, gitlab
allowed-tools: Bash(git:) Bash(gh:) Bash(glab:) Bash(jq:) Bash(bc:*)

Contributor Codebase Analyzer

Deep-dive code analysis with periodic saving. Two modes:

Contributor mode — reads every commit diff, calculates accuracy, assesses promotion readiness
Codebase mode — maps repo structure, cross-repo relationships, enterprise governance

Works with GitHub (gh) and GitLab (glab). Saves checkpoints to $PROJECT/.cca/ for resume across sessions.

Getting Started

First-time users: run onboarding to detect your platform and configure the skill.

./scripts/checkpoint.sh onboard

This will:
1. Detect your git platform (GitHub or GitLab)
2. Identify the repo and org/group
3. Create .cca/ directory with config
4. Verify CLI tools are available
5. Optionally add your first contributor to track

See references/onboarding.md for the full guided setup.

Mode Detection

Trigger	Mode	Action
"analyze @user" / "annual review" / "promotion" / "contributor"	Contributor	Deep-dive commit analysis
"analyze repo" / "codebase" / "architecture" / "governance" / "dependencies"	Codebase	Repository structure analysis
"compare engineers" / "team comparison"	Contributor	Multi-engineer comparison
"ownership" / "SPOF" / "who owns"	Contributor	Production ownership mapping
"tech debt" / "security audit" / "portfolio"	Codebase	Governance analysis
"resume" / "checkpoint" / "continue analysis"	Either	Load last checkpoint, resume
"onboard" / "setup" / "getting started"	Setup	Run onboarding flow

Platform Support

All analysis uses local git for commit-level work. Platform CLIs are used only for PR/MR metadata:

Feature	GitHub (`gh`)	GitLab (`glab`)
PR/MR counts	`gh search prs`	`glab mr list`
Reviews	`gh search prs --reviewed-by`	`glab mr list --reviewer`
User lookup	`gh api users/NAME`	`glab api users?username=NAME`
Org repos	`gh repo list ORG`	`glab project list --group GROUP`
API access	`gh api`	`glab api`

Auto-detection: The skill reads git remote URLs to determine the platform. No manual configuration needed.

Periodic Saving

All analysis saves incrementally to $PROJECT/.cca/. See references/periodic-saving.md.

$PROJECT/.cca/
├── contributors/@username/
│   ├── profile.jsonl            # Append-only analysis runs
│   ├── checkpoints/2025-Q1.md   # Quarterly snapshots
│   ├── latest-review.md         # Most recent annual review
│   └── .last_analyzed           # ISO timestamp + last SHA
├── codebase/
│   ├── structure.json           # Repo structure map
│   ├── dependencies.json        # Dependency catalog
│   └── .last_analyzed
├── governance/
│   ├── portfolio.json           # Technology portfolio
│   ├── debt-registry.json       # Technical debt items
│   └── .last_analyzed
└── .cca-config.json             # Skill configuration

Resume protocol: On every invocation, check .last_analyzed files. If prior state exists, resume from the gap — never re-analyze already-saved work.

Quick Reference

Contributor Mode

Step 0 — Check before analyzing (mandatory):

./scripts/checkpoint.sh check contributors/@USERNAME --author EMAIL

FRESH → run full analysis
CURRENT → skip, already analyzed, no new commits
INCREMENTAL → analyze only new commits since last checkpoint

Count commits before launching agents:

git log --author="EMAIL" --after="YEAR-01-01" --before="YEAR+1-01-01" --oneline | wc -l

Batch sizing (hard limits from real failures):

Commits	Action
<=40	Read in main session
41-70	Single agent writes findings to file
71-90	Split into 2 agents
91+	WILL FAIL — split into 3+ or monthly agents

Agents write to files, return 3-line summaries. Never return raw analysis inline.

7-phase annual review process:
1. Identity Discovery — find all git email variants
2. Metrics — commits, PRs/MRs, reviews, lines (git + platform CLI)
3. Read ALL Diffs — quarterly parallel agents, file-based output
4. Bug Introduction — self-reverts, crash-fixes, same-day fixes, hook bypass
5. Code Quality — anti-patterns and strengths from diff reading
6. Report Generation — structured markdown with growth assessment + development plan
7. Comparison — multi-engineer strengths comparison with evidence

Accuracy rate:

Effective Accuracy = 100% - (fix-related commits / total commits)

Rate	Assessment
>90%	Excellent
85-90%	Good
80-85%	Concerning
<80%	Needs focused improvement

Tool separation:
- Platform CLI (gh/glab): Get commit lists, PR/MR counts, review counts, user lookup
- Local git: Read commit diffs, blame, shortlog from cloned repo (faster, no rate limits)
- Use CLI to discover what to analyze, use local repo to read the actual code

Codebase Mode

Three tiers of analysis:

Tier	Scope	Output
Repo Structure	Single repo internals	`codebase/structure.json`
Cross-Repo	Multi-repo relationships	`codebase/dependencies.json`
Governance	Enterprise portfolio	`governance/portfolio.json`

Cross-repo analysis:

# GitHub
gh repo list ORG --limit 100 --json name,language,updatedAt

# GitLab
glab project list --group GROUP --per-page 100 -o json

API Rate Limits

Contributor analysis is mostly rate-limit-free (Phases 3-7 use local git only). Cross-repo analysis (Tier 2-3) loops over org repos via API — check limits before heavy operations:

./scripts/checkpoint.sh ratelimit

If rate-limited mid-scan, progress is saved automatically. Resume skips already-processed repos.

Checkpoint Commands

# Onboard (first-time setup)
./scripts/checkpoint.sh onboard

# Save current state
./scripts/checkpoint.sh save contributors/@alice

# Resume from last checkpoint
./scripts/checkpoint.sh resume contributors/@alice

# Show checkpoint status
./scripts/checkpoint.sh status

Priority-Ordered References

Priority	Reference	Impact	Mode
0	`onboarding.md`	SETUP	Both
1	`periodic-saving.md`	CRITICAL	Both
2	`contributor-analysis.md`	CRITICAL	Contributor
3	`accuracy-analysis.md`	HIGH	Contributor
4	`code-quality-catalog.md`	HIGH	Contributor
5	`qualitative-judgment.md`	HIGH	Contributor
6	`report-templates.md`	HIGH	Contributor
7	`codebase-analysis.md`	HIGH	Codebase

Problem to Reference Mapping

Problem	Start With
First time using this skill	`onboarding.md`
Annual review for 1 engineer	`contributor-analysis.md` then `report-templates.md`
Comparing 2+ engineers	`contributor-analysis.md` then `qualitative-judgment.md`
Engineer has 200+ commits	`contributor-analysis.md` (batch sizing section)
Resume interrupted analysis	`periodic-saving.md`
Is this engineer promotion-ready?	`qualitative-judgment.md` then `accuracy-analysis.md`
Who owns the payment system?	`contributor-analysis.md` (production ownership section)
Map repo architecture	`codebase-analysis.md` (Tier 1)
Cross-repo dependencies	`codebase-analysis.md` (Tier 2)
Enterprise tech portfolio	`codebase-analysis.md` (Tier 3)
Quality assessment from code	`code-quality-catalog.md` then `accuracy-analysis.md`
Plateau detection	`qualitative-judgment.md` (growth trajectory section)
Tech debt inventory	`codebase-analysis.md` (governance section)

QMD Pairing

This skill complements QMD (knowledge search). Division of responsibility:

Concern	Tool
Search documentation, wikis, specs	QMD
Analyze commit diffs, code quality	Contributor Codebase Analyzer
Find API references, tutorials	QMD
Map repository structure	Contributor Codebase Analyzer
Answer "how does X work?"	QMD
Answer "who built X and how well?"	Contributor Codebase Analyzer

Usage Examples

# First-time setup
"Set up contributor-codebase-analyzer for this repo"

# Annual review — provide GitHub/GitLab username (email auto-discovered from git log)
"Analyze github.com/alice-dev for 2025 annual review in repo org/repo"

# Multi-engineer comparison
"Analyze github.com/alice-dev, github.com/bob-eng, gitlab.com/charlie for 2025 reviews.
 I need to decide which 2 get promoted."

# Production ownership mapping
"Analyze production code ownership in this repo"

# Resume interrupted analysis
"Resume the contributor analysis for github.com/alice-dev"

# Repository structure analysis
"Analyze the codebase structure of this repo"

# Cross-repo dependency mapping (works with GitHub orgs or GitLab groups)
"Map dependencies across all repos in our org"

# Enterprise governance audit
"Run a governance analysis: tech portfolio, debt registry, security posture"

# Checkpoint status
"Show me the current analysis checkpoint status"

Full Compiled Document

For the complete guide with all references expanded: AGENTS.md

# README.md

Contributor Codebase Analyzer

Read every commit. Grow every engineer.

DORA metrics tell you how fast you're moving. They don't tell you where you're going.

Dashboards full of PR volume, commit counts, and "impact scores" actively punish your best architects — the ones simplifying complexity rather than adding to it.

This Agent Skill values brevity over volume and design patterns over code churn. It brings the rigor of manual code review to agentic workflows — reading every commit diff, not a sample.

The Problem

Engineering reviews rely on vanity metrics and manager impressions. These miss what matters: an engineer who reduces a payment processor from 2,000 lines to 400 looks like "low output" on a dashboard. An engineer repeatedly shipping debug code to production looks like "high output."

No dashboard shows you the difference. Reading the code does.

The Solution

This skill doesn't count lines. It reads the diffs.

Reads every commit diff for each contributor, quarterly
Calculates an accuracy baseline — how often code ships without needing fixes
Detects design improvements and anti-patterns from actual changes
Maps codebase structure, cross-repo dependencies, and technical debt
Generates growth reviews with specific commit evidence

No scores. No rankings. Accuracy rates surface where to look deeper — not verdicts.

All analysis saves incrementally — built for agentic context limits. Interrupt and resume without losing progress.

Getting Started

1. Prerequisites

git (required)
gh — GitHub CLI (required for GitHub repos)
glab — GitLab CLI (required for GitLab repos)
jq and bc (optional, for structured output and calculations)

2. Install

npx skills add anivar/contributor-codebase-analyzer -g

This auto-detects your AI agents (Claude Code, Cursor, Gemini CLI, GitHub Copilot, and others) and installs the skill to all of them.

Manual install:

git clone https://github.com/anivar/contributor-codebase-analyzer.git
ln -s "$(pwd)/contributor-codebase-analyzer" ~/.agents/skills/contributor-codebase-analyzer

3. Onboard

Navigate to your project and run:

./scripts/checkpoint.sh onboard

This auto-detects your platform (GitHub/GitLab), repo, and org. No manual configuration needed.

4. Use

"Analyze github.com/alice-dev for 2025 annual review in repo org/repo"

"Compare github.com/alice-dev and github.com/bob-eng for 2025. Who should be promoted?"

"Analyze the codebase structure of this repo"

"Map dependencies across all repos in our org"

"Run a governance analysis: tech portfolio, debt registry, security posture"

What You Get

Contributor Reviews

Accuracy rate: 100% - (fix-related commits / total commits) — a baseline that surfaces where to look deeper
Anti-pattern detection: debug code shipped, empty catch blocks, hook bypass, mega-commits
Strength identification: defensive programming, offline-first design, code reduction, feature gating
Quarter-by-quarter breakdown: growth trajectory, complexity trends, domain breadth
Development assessment: readiness signals, growth areas, and next-level plan
Multi-engineer comparison: complementary strengths with contextual evidence

Codebase Reports

Repository structure: module map, entry points, architecture patterns, dependencies
Cross-repo relationships: shared libraries, internal packages, dependency graph
Enterprise governance: technology portfolio, technical debt registry, security posture

Periodic Checkpoints

All work saves to $PROJECT/.cca/ in append-only JSONL format. Resume from any phase — never re-analyze what's already been processed.

Platform Support

Works with both GitHub and GitLab (including nested subgroups). Platform is auto-detected from your git remote URL.

Feature	GitHub	GitLab
PR/MR metadata	`gh` CLI	`glab` CLI
Code analysis	Local `git`	Local `git`
Org discovery	`gh repo list`	`glab project list`

How It Works

The skill follows a 7-phase process for contributor analysis:

Identity Discovery — finds all git email variants automatically
Metrics — commits, PRs/MRs, reviews, lines changed
Read ALL Diffs — quarterly parallel agents with batch sizing to prevent failures
Bug Introduction — self-reverts, crash-fixes, same-day fixes, hook bypass
Code Quality — anti-patterns and strengths from actual diffs
Report Generation — structured markdown with growth assessment and development plan
Comparison — multi-engineer strengths comparison with evidence

Batch sizing is enforced from hard limits discovered in production:

Commits per batch	Strategy
1-40	Direct read
41-70	Single agent
71-90	2 parallel agents
91+	3+ agents or monthly splits

Project Structure

├── SKILL.md              # Agent entry point and routing
├── AGENTS.md             # Full compiled guide for agents
├── assets/
│   └── logo.svg          # Project logo
├── references/           # Progressive disclosure by topic
│   ├── onboarding.md
│   ├── contributor-analysis.md
│   ├── accuracy-analysis.md
│   ├── code-quality-catalog.md
│   ├── qualitative-judgment.md
│   ├── report-templates.md
│   ├── codebase-analysis.md
│   └── periodic-saving.md
├── scripts/
│   └── checkpoint.sh     # Save/resume/status/ratelimit helper
└── LICENSE

Design Decisions

Why every commit, not sampling?
Sampling misses the story. An engineer's best work might be a 12-line fix that prevents a payment double-charge. Sampling skips it. Reading every diff is what experienced code reviewers do — this skill encodes that expertise into a repeatable process.

Why an Agent Skill?
Analyzing a year of commits doesn't fit in a single AI session or context window. Agent Skills are self-contained expertise that save checkpoints, resume across sessions, and never re-read a diff already understood. This skill automates the reading — the thinking is still yours.

Why baselines, not scores?
Numbers like commit counts, lines changed, or accuracy rates aren't evaluations — they're baselines that surface where to look. A low accuracy rate doesn't mean a bad engineer — it often means they own the riskiest module in the system. Numbers open the door. Reading the code walks through it.

Why constructive framing?
Strong engineering cultures grow people — they don't grade them. Every label, every comparison, every recommendation is framed for growth: "Developing" not "Below Expectations," "Growth Areas" not "Blockers," strengths before concerns. The fairness checks aren't afterthoughts — they're load-bearing.

Why both GitHub and GitLab?
Enterprise teams don't live on one platform. Auto-detection from git remote -v means zero configuration — the skill adapts to whatever the team uses.

Constructive Use

This tool reads code to grow engineers, not judge them.

Do:
- Use findings to start development conversations, not deliver verdicts
- Share reports WITH the engineer, not just about them
- Recognize strengths before discussing growth areas
- Consider repository and project context
- Consider team context: deadlines, on-call, team changes, unfamiliar codebases

Don't:
- Analyze commits in isolation — diffs without project context are noise
- Use metrics to justify termination without human context
- Compare engineers as a ranking exercise — compare for complementary strengths
- Treat anti-patterns as character flaws — they're often process or tooling gaps
- Share individual accuracy rates publicly or competitively
- Run analysis without the engineer's awareness

Fairness checks built in:
- Same time window and scope for all contributors
- Fix-related commits analyzed for root cause (inherited bug vs introduced)
- Low-activity periods flagged for context (leave, onboarding, cross-team work)
- "Peak then decline" flagged as "support needed," not penalized
- All labels are growth-oriented: "Developing" not "Below Expectations"

License

MIT

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

contributor-codebase-analyzer

# Description

# SKILL.md

Contributor Codebase Analyzer

Getting Started

Mode Detection

Platform Support

Periodic Saving

Quick Reference

Contributor Mode

Codebase Mode

API Rate Limits

Checkpoint Commands

Priority-Ordered References

Problem to Reference Mapping

QMD Pairing

Usage Examples

Full Compiled Document

# README.md

Contributor Codebase Analyzer

The Problem

The Solution

Getting Started

1. Prerequisites

2. Install

3. Onboard

4. Use

What You Get

Contributor Reviews

Codebase Reports

Periodic Checkpoints

Platform Support

How It Works

Project Structure

Design Decisions

Constructive Use

License

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill