Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add trotsky1997/My-Claude-Agent-Skills --skill "codebase-reading"
Install specific skill from multi-skill repository
# Description
Systematic methodology for reading and understanding large codebases efficiently. Use when (1) Understanding a new or unfamiliar codebase quickly, (2) Preparing to modify or extend existing code safely, (3) Debugging complex issues requiring deep code understanding, (4) Onboarding new team members to a codebase, (5) Performing code audits or security reviews, (6) Refactoring legacy code with confidence, (7) Creating documentation for existing systems, (8) Tracing execution flows and data transformations
# SKILL.md
name: codebase-reading
description: Systematic methodology for reading and understanding large codebases efficiently. Use when (1) Understanding a new or unfamiliar codebase quickly, (2) Preparing to modify or extend existing code safely, (3) Debugging complex issues requiring deep code understanding, (4) Onboarding new team members to a codebase, (5) Performing code audits or security reviews, (6) Refactoring legacy code with confidence, (7) Creating documentation for existing systems, (8) Tracing execution flows and data transformations
metadata:
short-description: Efficiently read and understand large codebases
Large Codebase Reading Methodology
A systematic approach to understanding large codebases efficiently: read to modify safely, not to memorize everything.
Core Principle
Goal-oriented reading: Always start with a concrete objective (fix bug, add feature, debug issue, security audit). Without a goal, you'll get lost in details.
Success criteria: You can explain the execution flow and modify code confidently, not that you've read every file.
Documentation Structure
When documenting your understanding, use this structure:
code-reading/
βββ README.md # Index and progress tracking
βββ code-reading.md # Main framework (methodology + findings + terminology)
βββ architecture.md # C4 architecture map (Level 1-4)
βββ api-flow.md # Execution flow tracing
βββ key-modules.md # Detailed module analysis
Progressive disclosure:
- README: Overview and navigation
- Main doc: Methodology and high-level findings
- Specialized docs: Detailed analysis (loaded only when needed)
Terminology section:
- Include a dedicated "Terminology" or "Key Concepts and Terminology" section in code-reading.md (e.g., section 9)
- Build incrementally from day 1, don't wait until everything is understood
- Cross-reference terms throughout documentation (architecture, flows, modules)
- Maintain as a living document - update as understanding deepens
Quick Start
Step 1: Define Your Goal
Start with a concrete, one-sentence objective:
- β Good: "Trace HTTP request from route handler to database and back"
- β Good: "Understand how authentication tokens are validated and refreshed"
- β Bad: "Understand the entire codebase"
Step 2: Run the Project First
First hour priority:
- Read README - project purpose, setup, minimal example
- Read CONTRIBUTING/development guide - testing, structure, workflow
- Run a minimal path - local build, one test, or demo
Why: Running code transforms reading from guessing to verification.
Step 3: Map the Architecture (C4 Thinking)
Draw a rough map from far to near (no need to be perfect):
Level 1: System Context - Who uses it? What external dependencies?
Level 2: Containers - What are the deployable units? (services, databases, workers)
Level 3: Components - What are the key components within one container?
You don't need beautiful diagrams - just enough to navigate: where's the entry point, how does data flow, where are the boundaries?
Step 4: Trace a Real Request Path
Instead of reading modules, read one execution path:
- Web: route β controller β service β repository β external API
- CLI: main β command parser β execution path
- Async: consumer β handler β processing β ack/retry
Strongly recommended: Use a debugger, add logging, set breakpoints to walk through once.
Step 5: Treat Tests as Executable Documentation
If code is old/hard to test, write characterization tests first - record current behavior, then refactor under protection.
Core Workflow
1. Goal-Oriented Setup
Define your reading task:
Write a one-sentence goal that describes what you want to achieve:
- "I can explain the request flow from HTTP to database"
- "I can safely modify the authentication module"
- "I can trace the OCR pipeline from image input to result output"
Avoid: Reading without purpose or starting from directory trees.
2. Project Bootstrap
Priority order (first hour):
- README.md - What does it do? How to start? Minimal example?
- CONTRIBUTING.md / docs/ - How to test? Code structure? Branch strategy?
- Run minimal path - Can you build it? Run one test? Execute a demo?
Verify: Model files exist, tests run, examples work.
3. Architecture Mapping (C4 Model)
Level 1: System Context
- Users/applications that interact with the system
- External dependencies (databases, APIs, services)
- System boundaries
Level 2: Containers
- Deployable units (frontend, API server, workers, databases, queues)
- Communication between containers
Level 3: Components
- Key components within a container (auth service, domain service, repository, adapter)
- Component relationships and dependencies
Output: A rough map that answers:
- Where's the entry point?
- How does data flow?
- Where are the boundaries?
4. Entry Point + Request Tracing
Find the entry point:
- Web: main(), route handlers, controllers
- CLI: main(), command parsers
- Library: public API functions, constructors
Trace one complete path:
- Use debugger to step through
- Add logging at key points
- Set breakpoints to verify understanding
Document the flow: Write down the execution path as you trace it.
5. Tests as Documentation
Existing tests:
- Read tests to understand expected behavior
- Tests show how components are used
- Tests document edge cases and error handling
Missing tests:
- Write characterization tests (record current behavior)
- Use tests as a safety net before modifications
- Tests become regression suite
6. Git Archaeology
When you see "weird code":
Don't dismiss it immediately - ask: What historical problem is this solving?
Commands:
git blame -w -- path/to/file # Who changed this and when?
git log -p -- path/to/file # Full change history
git log --grep="keyword" # Find related commits
git show <commit-hash> # View specific change
What to look for:
- Commit messages explaining why
- PR discussions showing trade-offs
- Major refactors showing architecture evolution
7. Use Tools Efficiently
Code search:
rg "keyword" -n . # Ripgrep (faster than grep)
git grep "keyword" # Git-optimized search
rg "main\(" . # Find entry points
rg "TODO|FIXME" . # Find todos
Git tools:
git log --graph --oneline --all # Visual history
git log --follow -- path/to/file # File rename tracking
git diff HEAD~5 HEAD # Compare versions
Language-specific tools:
- Rust: cargo tree, cargo clippy, cargo doc
- Python: pytest, mypy, pylint
- JavaScript: npm list, eslint, tsc
8. Build and Maintain Terminology Glossary
Critical for understanding: Build a living terminology glossary from day 1, not as an afterthought.
Why it matters:
- Domain-specific terms (OCR: detection, recognition, CTC, NMS)
- Acronyms that need expansion (CLS, DB, IOU)
- Project-specific abstractions (custom types, internal concepts)
- Confusion between similar concepts (Orientation vs CLS, Detection vs Recognition)
When to build:
- Initial collection: During first reading pass (Steps 1-4) - collect terms as you encounter them
- Deep dive enrichment: During detailed module analysis (Step 10) - add technical details and context
- Continuous maintenance: Update as understanding deepens, add cross-references, refine definitions
What to include:
- Definition: Clear explanation of what the term means
- Context: Where and how it's used in the codebase
- Code references: File paths, function names, line numbers
- Relationships: Related terms, parent/child concepts, synonyms
- Examples: Usage examples, code snippets when helpful
Organization:
- Classify by domain (OCR terms, ML terms, project-specific)
- Group by abstraction level (concepts, algorithms, data structures, implementation)
- Cross-reference to architecture diagrams, API flows, module analysis
Example structure:
## Terminology Glossary
### Domain-Specific Terms
- **Detection (ζ£ζ΅)**: Locating text regions in images
- **Implementation**: `src/det.rs`
- **Related**: Recognition, NMS, DB
- **See**: `api-flow.md` section 4.1
### Data Structures
- **`Mat`**: Image matrix abstraction
- **Implementation**: `src/image_impl.rs`
- **Purpose**: Unified image representation for Pure Rust and OpenCV backends
### Algorithms
- **NMS (Non-Maximum Suppression)**: Algorithm for filtering overlapping boxes
- **Implementation**: `src/geometry.rs::nms()`
- **Related**: IOU (Intersection over Union)
Tools for term extraction:
# Extract acronyms (all caps, 2-5 chars)
rg "\b[A-Z]{2,5}\b" README.md docs/ | sort -u
# Extract struct/enum/trait names
rg "^(pub )?(struct|enum|trait|type) \w+" src/ -o | sort -u
# Find config-related terms
rg "Config.*\{|struct \w+Config" src/
Maintenance checklist:
- [ ] All acronyms have expansions
- [ ] All domain terms have clear definitions
- [ ] All key data structures are documented
- [ ] Cross-references to code are accurate
- [ ] Related terms are linked
- [ ] Examples provided for complex terms
Common Patterns
Pattern 1: Reading Without a Goal
Problem: Reading files alphabetically or by directory structure.
Solution: Start with a concrete task. Even if it's "understand how X works," make it specific.
Pattern 2: Getting Lost in Details
Problem: Deep-diving into every module before understanding the flow.
Solution: First trace one complete execution path. Then dive into specific modules as needed.
Pattern 3: Ignoring Tests
Problem: Treating tests as optional or skipping them.
Solution: Tests are the most reliable documentation. Read them first. If missing, write characterization tests.
Pattern 4: Not Using Git History
Problem: Seeing "weird code" and assuming it's wrong.
Solution: Use git blame and git log to understand context. Code exists for reasons, even if not obvious.
Pattern 5: Ignoring Terminology (Acronyms and Domain Terms)
Problem: Encountering acronyms (CTC, CLS, DB, NMS) or domain terms (Detection vs Recognition) and assuming you'll remember what they mean later. Repeatedly looking up the same terms.
Solution: Build a terminology glossary from day 1. Collect terms as you encounter them, enrich with context as understanding deepens. Cross-reference terms throughout documentation. Treat it as a living document, not a one-time task.
Success Indicators
You've successfully understood a codebase when:
β
You can explain the main execution flow from entry to exit
β
You can locate where specific functionality is implemented
β
You can trace data transformations through the system
β
You can identify where to make changes for your goal
β
You can explain architectural decisions (even if you'd do differently)
β
You have a comprehensive terminology glossary that helps navigate the codebase
β
You can explain any domain-specific term in one sentence
β
You rarely need to re-lookup the same term
β
You feel confident modifying code without breaking things
Not required:
- β Having read every file
- β Memorizing every function
- β Understanding every detail
Troubleshooting
"I don't know where to start"
β Define a concrete goal. Even "understand how user authentication works" is better than "understand everything."
"The codebase is too large"
β Focus on your goal. Trace one execution path. Ignore unrelated modules.
"I can't find the entry point"
β Look for main(), route definitions, or public API functions. Use code search tools.
"The code doesn't make sense"
β Use Git history to understand why it's written this way. Check tests for usage examples.
"There are no tests"
β Write characterization tests. Record current behavior before modifying.
"I keep forgetting what CTC/CLS/DB means"
β Build a terminology glossary. Start early, collect terms as you encounter them. Include definitions, context, code references, and relationships. Cross-reference throughout documentation.
"The domain terms are confusing"
β Don't just look them up once. Document them with context, usage examples, and relationships to other terms. Classify by domain and abstraction level. Update as understanding deepens.
References
For detailed methodology and examples, see:
- methodology.md - Complete 11-step methodology with detailed explanations
- tool-checklist.md - Comprehensive tool checklist by category (search, Git, language-specific, debugging)
- terminology-building.md - Complete guide to building and maintaining terminology glossaries (5-step process, classification methods, best practices, common patterns)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.