Evelyn5410

github-knowledge-base

0
0
# Install this skill:
npx skills add Evelyn5410/github-knowledge-base

Or install specific skill: npx add-skill https://github.com/Evelyn5410/github-knowledge-base

# Description

>

# SKILL.md


name: github-knowledge-base
version: 1.1.0
description: >
Build and explore a personal knowledge base of GitHub repositories. Discover repos by topic,
track API changes, clone and analyze codebases, search for implementation patterns, and compare
different approaches across your collection.

BONUS FEATURES for code reviews: Smart book detection (Clean Code, Refactoring, Design Patterns, etc.)
uses Claude's training data instead of reading PDFs (saves 40K+ tokens per book). PDF summarization
available for research papers and niche documents (80-90% token savings on re-reads).

Use this skill when the user wants to: add/remove repos, search GitHub, explore codebases, track API changes,
compare implementations, or review code using technical book principles.

Triggers: "add repo", "find repos for", "search my KB", "how does X handle Y", "compare repos",
"explore this repo", "track changes", "what's new in", "review code using Clean Code".

author: Eve
tags: [github, knowledge-base, code-exploration, search, repositories, code-review, api-tracking, token-optimization]


GitHub Knowledge Base Skill

Build and maintain a personal knowledge base of GitHub repositories for code exploration, learning, and API tracking. Includes smart features for token-efficient code reviews.

Core Capabilities

📚 GitHub Repository Knowledge Base

  1. Repository Management - Add, tag, and organize GitHub repositories
  2. Discovery - Search GitHub and find related repositories
  3. Exploration - Clone and analyze repository structure
  4. Code Search - Search for patterns across your knowledge base
  5. Comparison - Compare how different repos solve similar problems
  6. Change Tracking - Monitor API changes, breaking changes, and releases

🎯 Smart Code Reviews (Token-Optimized)

  1. Known Books Detection - Auto-detect popular technical books (Clean Code, Refactoring, etc.) already in Claude's training data to avoid wasting 40,000-60,000 tokens per book
  2. Smart PDF Summarization - Create structured summaries that save 80-90% tokens on repeated reads
  3. Token Cost Transparency - Every PDF shows estimated token cost before reading

Persistent Storage

All data is stored in ~/.config/github-kb/:
- index.json - Registry of all repositories with metadata
- repos/ - Cloned repositories
- notes/ - Personal notes and PDFs
- notes/pdf_index.json - PDF metadata with token estimates
- notes/*.summary.md - Token-efficient PDF summaries
- cache/ - Cached API responses

Available Commands

Command Shortcuts: Users can optionally install command shortcuts (kb, kb-search, kb-explore, kb-changes) by running ./install-commands.sh. When invoking commands, prefer using the short form if available (e.g., kb add instead of python kb.py add). Both forms work identically.

KB Management (kb / kb.py)

# Add a repository
python kb.py add facebook/react
python kb.py add https://github.com/facebook/react

# List repositories
python kb.py list
python kb.py list --tag frontend
python kb.py list --status explored

# Tag repositories
python kb.py tag facebook/react frontend ui library

# Add notes
python kb.py note facebook/react "Great hooks implementation"

# Set status (bookmarked, exploring, explored, archived)
python kb.py status facebook/react explored

# Get info
python kb.py info facebook/react
python kb.py stats

# Remove repository
python kb.py remove facebook/react

Change Tracking (kb_changes.py)

# Show latest release and commits
python kb_changes.py latest facebook/react
python kb_changes.py latest facebook/react --detailed  # With change analysis

# Show changelog
python kb_changes.py changelog facebook/react

# Track API changes (detects property renames, function changes, etc.)
python kb_changes.py api-changes facebook/react
python kb_changes.py api-changes facebook/react --pattern "*.ts"

# Compare versions
python kb_changes.py compare facebook/react v17.0.0 v18.0.0

# Watch for updates
python kb_changes.py watch facebook/react

# Check all watched repos for updates
python kb_changes.py updates
python kb_changes.py updates facebook/react  # Check specific repo

Search & Discovery (kb_search.py)

# Search GitHub
python kb_search.py github "react state management" --stars ">1000" --language typescript

# Find related repositories
python kb_search.py related facebook/react

# Search code in your KB
python kb_search.py code "useEffect" --tag frontend
python kb_search.py code "handleError" --repo facebook/react

# Compare implementations
python kb_search.py compare facebook/react preactjs/preact "virtual dom"

Repository Exploration (kb_explore.py)

# Clone repository
python kb_explore.py clone facebook/react
python kb_explore.py clone facebook/react --depth 1  # shallow clone

# Sync (pull) repository
python kb_explore.py sync facebook/react

# Analyze structure
python kb_explore.py analyze facebook/react

# Show directory tree
python kb_explore.py tree facebook/react --depth 3

# View README
python kb_explore.py readme facebook/react

# Find documentation
python kb_explore.py docs facebook/react

# Find entry points
python kb_explore.py entry-points facebook/react

# Find tests
python kb_explore.py find-tests facebook/react

PDF Management (kb_pdf.py)

# Add PDF from local file
python kb_pdf.py add ~/Documents/react-internals.pdf --title "React Internals Guide" --tags react architecture

# Add PDF from cloned repository
python kb_pdf.py scan-repo facebook/react  # Find PDFs in repo
python kb_pdf.py add ~/.config/github-kb/repos/facebook__react/docs/Architecture.pdf --source facebook/react

# Remove PDF from knowledge base (original file not affected)
python kb_pdf.py remove react-internals.pdf

# List all PDFs
python kb_pdf.py list
python kb_pdf.py list --tag architecture

# Get PDF info (shows token estimate)
python kb_pdf.py info react-internals.pdf

# Search PDFs by title/tags
python kb_pdf.py search "react"

# Create token-efficient summary
python kb_pdf.py summarize react-internals.pdf

# Tag PDFs for organization
python kb_pdf.py tag react-internals.pdf frontend performance

Known Books Detection (kb_books.py)

Smart Token Management: Automatically detects when PDFs are popular technical books already in Claude's training data, preventing token waste.

# List all known books
python kb_books.py list

# Search for a book
python kb_books.py search "clean code"
python kb_books.py search "refactoring"

# Check if a book is known (before adding PDF)
python kb_books.py check "Clean Code by Robert Martin"

# View curated combinations
python kb_books.py combos

# Show combination details with ready-to-use prompts
python kb_books.py combo clean-code-fundamentals
python kb_books.py combo java-mastery
python kb_books.py combo software-architecture
python kb_books.py combo craftsmanship

Currently Known Books: Clean Code, Refactoring, Design Patterns, Clean Architecture, Effective Java, Effective Python, The Pragmatic Programmer, Domain-Driven Design

Workflow Instructions for Claude

When the user wants to add a repository:

  1. Use kb.py add <repo> to add it to the knowledge base
  2. Show the summary and metadata returned
  3. Suggest related repositories they might want to add using kb_search.py related <repo>
  4. Recommend next steps: cloning, tagging, or adding notes

Example:

User: "Add the fastify repo to my knowledge base"

Steps:
1. Run: kb add fastify/fastify
2. Show the repository summary, stars, language
3. Run: kb-search related fastify/fastify --limit 5
4. Suggest: "I've added Fastify. You might also want to add Express, Koa, or Hapi as related frameworks."

When the user wants to find repositories:

  1. Use kb_search.py github <query> with appropriate filters
  2. Present the results with stars, language, and descriptions
  3. Highlight any repos already in their KB
  4. Offer to add promising ones

Example:

User: "Find me some good GitHub repos for rate limiting in Node.js"

Steps:
1. Run: kb-search github "rate limiting nodejs" --stars ">500"
2. Present top results with context
3. Ask: "Would you like to add any of these to your knowledge base?"

When the user wants to explore a repository:

  1. First check if it's in the KB, if not suggest adding it
  2. Clone if not already cloned: kb_explore.py clone <repo>
  3. Analyze structure: kb_explore.py analyze <repo>
  4. Show README: kb_explore.py readme <repo>
  5. Find key files: kb_explore.py entry-points <repo> and kb_explore.py docs <repo>

Example:

User: "Help me understand the architecture of the react repo"

Steps:
1. Check if react is in KB, if not: kb add facebook/react
2. Clone if needed: kb-explore clone facebook/react
3. Run: kb-explore analyze facebook/react
4. Run: kb-explore tree facebook/react --depth 2
5. Run: kb-explore readme facebook/react
6. Explain the structure based on the output
7. Suggest: kb status facebook/react exploring

When the user wants to search their KB:

  1. Use kb_search.py code <pattern> with appropriate filters
  2. Show code snippets from matching repositories
  3. Provide context about which repos matched and why

Example:

User: "What repos in my KB handle error handling? Show me examples"

Steps:
1. Run: python kb_search.py code "error|Error|handleError" --tag backend
2. Present the code snippets with file paths
3. Summarize the different approaches found

When the user wants to compare implementations:

  1. Ensure both repos are cloned
  2. Use kb_search.py compare <repo1> <repo2> <pattern>
  3. Analyze and explain the differences

Example:

User: "Compare how Express and Fastify handle middleware"

Steps:
1. Check if both are in KB and cloned
2. Run: python kb_search.py compare express fastify "middleware"
3. Show code from both repos
4. Explain the architectural differences

When the user wants to see their collection:

  1. Use kb.py list with appropriate filters
  2. Show the organized list
  3. Offer to explore specific repos or categories

Example:

User: "Show me all my frontend repos"

Steps:
1. Run: python kb.py list --tag frontend
2. Display the results
3. Ask: "Would you like to explore any of these in detail?"

When the user wants to track changes or updates:

  1. Use kb_changes.py latest <repo> to show recent releases and commits
  2. Add --detailed for automatic change analysis (breaking changes, API changes, etc.)
  3. Use kb_changes.py api-changes to detect property renames and API modifications
  4. Use kb_changes.py watch to track repositories for future updates

Example:

User: "What's new in React?"

Steps:
1. Run: kb-changes latest facebook/react --detailed
2. Show the latest release with analysis
3. Highlight breaking changes, new features, and API changes
4. Note any naming convention changes (e.g., camelCase to snake_case)

Example:

User: "Has the API changed in version 18?"

Steps:
1. Run: python kb_changes.py compare facebook/react v17.0.0 v18.0.0
2. Run: python kb_changes.py api-changes facebook/react
3. Show detected API changes including property renames
4. Explain the impact of changes

Example:

User: "Keep me updated on Next.js changes"

Steps:
1. Run: python kb_changes.py watch vercel/next.js
2. Confirm watching
3. Later: python kb_changes.py updates
4. Show any new releases or commits

Best Practices for Claude

  1. Be conversational - Don't just run commands, explain what you're doing and why
  2. Be helpful - Suggest next steps and related actions
  3. Be efficient - Run multiple commands in sequence when it makes sense
  4. Handle errors gracefully - If a repo isn't cloned, offer to clone it
  5. Suggest organization - Recommend tags, status updates, and notes
  6. Make connections - Point out related repos and patterns
  7. Educate - Explain what you find in the repositories

GitHub API Rate Limits

  • Without token: 60 requests/hour
  • With GITHUB_TOKEN: 5000 requests/hour

If rate limited, inform the user and suggest setting GITHUB_TOKEN:

export GITHUB_TOKEN=your_github_token_here

Common User Intents and Responses

User Says What to Do
"Add [repo] to my KB" kb.py add, then suggest related repos
"Find repos for [topic]" kb_search.py github with appropriate filters
"What's in my KB?" kb.py list or kb.py stats
"Explore [repo]" Clone (if needed), analyze, show structure
"How does [repo] handle [topic]?" Clone (if needed), search code for topic
"Compare [repo1] and [repo2]" Ensure both cloned, use compare command
"Search my repos for [pattern]" kb_search.py code
"What else should I add?" Suggest based on existing repos and their tags
"Show my [tag] repos" kb.py list --tag
"What's new in [repo]?" kb_changes.py latest with detailed analysis
"Has [repo] been updated?" kb_changes.py updates or check latest changes
"What changed in version X?" kb_changes.py compare between versions
"Show me breaking changes" kb_changes.py latest --detailed, highlight breaking changes
"Track API changes" kb_changes.py api-changes to detect renames and modifications
"Watch [repo] for updates" kb_changes.py watch to track future changes

Error Handling

  • If KB not initialized: Scripts will auto-initialize on first use
  • If repo not in KB: Suggest adding it first
  • If repo not cloned: Offer to clone it
  • If API rate limited: Suggest using GITHUB_TOKEN
  • If command fails: Explain the error and suggest fixes

Tips for Effective Use

  1. Start with discovery - Search GitHub first, then add interesting repos
  2. Tag consistently - Use tags like "frontend", "backend", "database", "auth", etc.
  3. Clone selectively - Only clone repos you want to explore deeply
  4. Use shallow clones - For large repos, use --depth 1 to save space
  5. Keep notes - Add context about why you added a repo and what you learned
  6. Track status - Update status as you progress through exploration
  7. Compare similar repos - Learn different approaches to the same problems

Example Conversations

Example 1: Adding and Exploring

User: Add the Next.js repo
Claude: [Runs kb.py add vercel/next.js]
        Added Next.js! It's a React framework with 120k+ stars.
        Related repos you might want to add: remix-run/remix, gatsbyjs/gatsby
        Want me to clone and explore it?

User: Yes, explore it
Claude: [Runs kb_explore.py clone vercel/next.js]
        [Runs kb_explore.py analyze vercel/next.js]
        Next.js has a well-organized monorepo structure...
        [Explains the architecture]
        I've set the status to 'exploring'. Want me to find the entry points?

Example 2: Research and Discovery

User: I need to implement auth in my Node.js app. Find me some good auth libraries.
Claude: [Runs kb_search.py github "nodejs authentication" --stars ">2000"]
        Found several excellent auth libraries:
        1. passport (23k stars) - Simple authentication middleware
        2. jsonwebtoken (17k stars) - JWT implementation
        3. bcrypt (6k stars) - Password hashing
        Would you like to add any of these to your KB?

User: Add passport
Claude: [Runs kb.py add jaredhanson/passport]
        [Runs kb_search.py related jaredhanson/passport]
        Added passport! Also found 200+ passport strategy plugins.
        Popular ones: passport-local, passport-google-oauth20, passport-jwt
        Want to add any of these?

Example 3: Code Search and Comparison

User: How do frameworks handle middleware? Search my KB
Claude: [Runs kb_search.py code "middleware|use\(" --tag backend]
        Found middleware implementations in 4 repos:

        Express: Uses .use() method with simple function composition
        Fastify: Uses plugin system with encapsulation
        Koa: Uses async/await with context object

        [Shows code snippets]

        Each has a different approach. Want me to do a detailed comparison?

User: Compare Express and Fastify
Claude: [Runs kb_search.py compare express fastify "middleware"]
        [Shows side-by-side code]

        Key differences:
        - Express: Synchronous by default, error handling via next()
        - Fastify: Async-first, uses plugin encapsulation
        - Fastify is generally faster due to schema-based validation

Example 4: Adding PDFs with Smart Detection

User: I downloaded the Clean Code PDF. Can I add it to my knowledge base?
Claude: Let me check if that's a known book first...
        [Runs kb_books.py check "Clean Code"]

        ⚠️  TOKEN ALERT!

        Clean Code by Robert C. Martin is already in my training data!
        Adding the PDF would waste ~45,000 tokens.

        Instead, you can ask me directly:
        - "Summarize the main principles from Clean Code by Robert Martin"
        - "Review this code using Clean Code principles"
        - "Explain the naming conventions from Clean Code"

        Want to see curated combinations? Try:
        kb-books combo clean-code-fundamentals

User: Show me the clean code fundamentals combination
Claude: [Runs kb_books.py combo clean-code-fundamentals]

        📦 Clean Code Fundamentals
        Essential books for writing clean, maintainable code

        📚 Included Books:
        ✅ Clean Code
        ✅ Refactoring
        ✅ Design Patterns
        ✅ Clean Architecture

        📖 Learning Path:
        1. Start with Clean Code - learn the basics
        2. Study Refactoring - learn to improve existing code
        3. Master Design Patterns - learn reusable solutions
        4. Apply Clean Architecture - learn system-level organization

        🎯 Ready-to-use prompt:
        "Review this code using principles from Clean Code, Refactoring,
         Design Patterns, and Clean Architecture"

User: Review my authentication code using those principles
Claude: [Reviews code using Clean Code, Refactoring, Design Patterns, and
        Clean Architecture principles from training data - no PDF needed!]

        Based on these four books:

        Clean Code perspective:
        - Functions are too long (should be 5-10 lines)
        - Variable names like 'usr' should be 'user'

        Refactoring perspective:
        - Extract Method: auth logic should be separate function
        - Replace Conditional with Polymorphism for auth strategies

        Design Patterns perspective:
        - Strategy pattern for different auth methods
        - Factory pattern for creating auth handlers

        Clean Architecture perspective:
        - Auth logic mixed with infrastructure
        - Should be in use case layer, not controller

Example 5: PDF Management for Research Papers

User: I have a research paper on React Fiber architecture. Can I add it?
Claude: [Runs kb_books.py check "React Fiber Architecture"]

        ✅ 'React Fiber Architecture' is not a known book.
        You can safely add this PDF to your knowledge base.

        [User proceeds]

User: Add the PDF
Claude: [Runs kb_pdf.py add ~/Downloads/fiber-arch.pdf --title "React Fiber Architecture"]

        ✓ Added PDF to knowledge base
        Title: React Fiber Architecture
        Size: 125.34 KB
        Estimated pages: ~2
        Estimated tokens: ~18,750

        ⚠️  Large PDF detected!
        Reading this entire PDF will consume ~18,750 tokens

        Consider creating a summary first:
          kb-pdf summarize fiber-arch.pdf

User: Create the summary
Claude: [Runs kb_pdf.py summarize fiber-arch.pdf]
        [Creates structured summary with sections]

        ✓ Created summary: fiber-arch.summary.md
        Summary tokens: ~1,850 (90% savings!)

        Now when you ask questions about Fiber, I'll read the summary
        instead of the full PDF - much more efficient!

When to Use PDFs vs Known Books

Use PDF:
- Research papers
- Internal documentation
- Unpublished content
- Niche technical documents
- Books NOT in Claude's training data

Use Known Books Feature:
- Popular technical books (Clean Code, Refactoring, etc.)
- Gang of Four Design Patterns
- Effective Java, Pragmatic Programmer
- Books published before 2024
- Standard computer science texts

Workflow: Always run kb-books check "<title>" before adding a PDF to avoid token waste.

Extending the Knowledge Base

As you use this skill, consider:

  • Creating categories - Organize by topic, language, or use case
  • Building learning paths - Track related repos for learning a topic
  • Documenting patterns - Note common patterns across repos
  • Sharing insights - Export your notes and key findings

Script Locations

All scripts are in the skill's scripts/ directory:
- scripts/kb.py - Main management script
- scripts/kb_search.py - Search and discovery
- scripts/kb_explore.py - Repository exploration
- scripts/kb_changes.py - Change tracking and analysis
- scripts/kb_pdf.py - PDF management with smart token optimization
- scripts/kb_books.py - Known books detection and prompt generation
- scripts/known_books.json - Database of popular technical books

Run scripts with Python 3.7+. They have no external dependencies except Git.

Additional Resources

  • references/workflows.md - Common workflow examples
  • references/github-api.md - GitHub API reference and optimization
  • references/change-tracking.md - Detailed change tracking workflows

Remember: This skill is about building knowledge, not just collecting repos. Help the user learn from the code, understand patterns, and make informed decisions.

# README.md

GitHub Knowledge Base Skill

A token-optimized Claude Code skill for building and maintaining a personal knowledge base of GitHub repositories and technical documents.


💭 Why This Skill Exists

This started as a simple, personal need: I kept visiting the same GitHub repositories and noticed Google's GenAI documentation wasn't always up-to-date with the latest API changes. I found myself manually checking for updates every time, which was frustrating.

So I built a tool to track API changes across repositories I frequently reference.

Then I realized: "If I'm already exploring these repos, why not use them for code reviews?" I could reference well-known projects to validate patterns and best practices.

That led to: "What about technical books for code reviews?" - I added PDF support to reference books like Clean Code and Refactoring during reviews.

Then came the lightbulb moment: "Wait... Claude already knows these books!" These popular technical books are in Claude's training data. Reading the PDFs wastes 40,000+ tokens per book - and more importantly, wastes computational resources and energy.

What started as a simple API tracking tool evolved into something bigger: a knowledge base that helps you explore code efficiently - both in terms of your time and our planet's resources. Every token saved is energy not consumed, computation not wasted.

If this helps you learn from code while being mindful of resource consumption, that makes me happy. 🌱


What This Skill Does

📚 GitHub Repository Knowledge Base

  • Discover repositories based on topics and keywords
  • Track Changes - Monitor API changes, breaking changes, and releases
  • Explore repository structure and documentation
  • Search code across your saved repositories
  • Compare implementations between different repos
  • Manage a persistent collection with tags and notes
  • Learn from open source code systematically

🎯 Smart Code Reviews with Token Optimization

  • Known Books Detection - Auto-detect popular technical books (Clean Code, Refactoring, Design Patterns, etc.) already in Claude's training data - saves 40K-60K tokens per book
  • Smart PDF Summarization - Create structured summaries that save 80-90% tokens on repeated reads
  • Token Cost Transparency - See estimated token cost before reading any PDF
  • Computational Efficiency - Reduce unnecessary processing and environmental impact

🌟 What Makes This Different

1. Built for Exploration AND Efficiency

Problem: Reading large PDFs and technical books repeatedly wastes hundreds of thousands of tokens.

Solution:
- Structured summaries: 80-90% token reduction on subsequent reads
- Known books database: 8 popular technical books (Clean Code, Refactoring, etc.) already in Claude's training data - skip the PDF entirely!
- Break-even point: After just 2 uses, summaries save massive tokens

Example:

Full PDF reading (10 times): 450,000 tokens
With summaries (10 times): 50,000 tokens
Savings: 400,000 tokens (89%)

2. Smart Detection

  • Automatically warns when you try to add a PDF of a known book
  • Provides ready-to-use prompts instead of wasting 40K+ tokens
  • Currently detects: Clean Code, Refactoring, Design Patterns, Clean Architecture, Effective Java, Effective Python, Pragmatic Programmer, Domain-Driven Design

3. Environmental Efficiency

  • Reduces computational waste from redundant processing
  • Token-aware operations minimize unnecessary API calls
  • Long-term ROI: 10x usage = 80% cumulative savings

Use Cases

🔍 Code Review & Quality Assurance

Review code against best practices from established projects:

# Build reference knowledge base
kb add passport/passport
kb add auth0/node-jsonwebtoken

# Search for patterns during review
kb-search code "passport.*strategy" --repo passport/passport
kb-search compare passport/passport auth0/node-jsonwebtoken "authentication"

# Check for recent security fixes
kb-changes latest passport/passport --detailed

📚 Learning New Technologies

Study frameworks systematically:

# Add and explore GraphQL ecosystem
kb add graphql/graphql-js
kb add apollographql/apollo-server

kb-explore clone graphql/graphql-js
kb-explore analyze graphql/graphql-js
kb-search code "resolver|schema" --tag graphql

⚖️ Framework Comparison

Make informed technology choices:

# Compare Express vs Fastify
kb add expressjs/express
kb add fastify/fastify

kb-search compare expressjs/express fastify/fastify "middleware"
kb-search compare expressjs/express fastify/fastify "performance"
kb-changes latest expressjs/express --detailed

🔄 Dependency Updates

Plan upgrades with confidence:

# Upgrading React 17 → 18
kb add facebook/react
kb-changes compare facebook/react v17.0.0 v18.0.0
kb-changes latest facebook/react --detailed  # Shows breaking changes
kb-search code "createRoot|useId" --repo facebook/react

🔒 Security Auditing

Audit application security:

# Add security references
kb add OWASP/CheatSheetSeries
kb add helmetjs/helmet

kb-search code "sanitize|csrf|xss" --tag security
kb-changes latest helmetjs/helmet --detailed

🚀 API Design Research

Design better APIs:

# Study well-designed APIs
kb add stripe/stripe-node
kb add twilio/twilio-node

kb-search code "pagination|error.*response" --tag api-design
kb-search compare stripe/stripe-node twilio/twilio-node "error"

🎯 Building Expertise

Become an expert in your domain:

# Node.js backend expertise
kb add nodejs/node
kb add expressjs/express
kb add prisma/prisma
kb add goldbergyoni/nodebestpractices

kb tag expressjs/express nodejs framework
kb-changes watch nodejs/node  # Track updates
kb-changes updates  # Weekly review

📊 Architecture Study

Learn architectural patterns:

# Microservices patterns
kb add nestjs/nest
kb add moleculerjs/moleculer

kb-search code "service.*discovery|event.*bus" --tag microservices
kb-explore tree nestjs/nest --depth 3

Installation

1. Install the Skill

claude skills add ./github-knowledge-base

Or from the skill directory:

cd github-knowledge-base
claude skills add .

For convenient short commands like kb add instead of python kb.py add:

cd github-knowledge-base
./install-commands.sh
source ~/.bashrc  # or ~/.zshrc, or restart terminal

This adds:
- kb - Repository management
- kb-search - Search & discovery
- kb-explore - Repository exploration
- kb-changes - Change tracking

Quick Start

First Steps

# 1. Check your KB (will be empty initially)
kb list

# 2. Add your first repository
kb add facebook/react

# 3. Tag and organize
kb tag facebook/react frontend library ui

# 4. View repository info
kb info facebook/react

# 5. See your collection
kb list
kb stats

Complete Example Workflow

Scenario: Learning React best practices

# Step 1: Build your knowledge base
kb add facebook/react
kb add vercel/next.js
kb add remix-run/remix

# Step 2: Organize with tags
kb tag facebook/react react framework frontend
kb tag vercel/next.js react framework ssr
kb tag remix-run/remix react framework fullstack

# Step 3: Clone for deep exploration
kb-explore clone facebook/react
kb-explore analyze facebook/react

# Step 4: Study specific patterns
kb-search code "useState|useEffect" --tag react
kb-search code "getServerSideProps" --repo vercel/next.js

# Step 5: Compare approaches
kb-search compare vercel/next.js remix-run/remix "data loading"

# Step 6: Track for updates
kb-changes watch facebook/react
kb-changes watch vercel/next.js

# Step 7: Check what's new
kb-changes latest facebook/react --detailed

Using with Claude Code

Ask Claude naturally - the skill activates automatically:

You: "Add the React repository to my knowledge base"
Claude: [Runs: kb add facebook/react]
       ✓ Added 'facebook/react' to knowledge base
       Summary: A declarative, efficient JavaScript library...

You: "What's new in the latest release?"
Claude: [Runs: kb-changes latest facebook/react --detailed]
       📦 Latest Release: v18.2.0
       ⚠️ Breaking Changes: ...

You: "Find authentication libraries for Node.js"
Claude: [Runs: kb-search github "nodejs authentication" --stars ">1000"]
       Found 10 repositories:
       1. passport/passport (22k ⭐)
       ...

Direct Terminal Usage

# Repository management
kb add expressjs/express
kb list --tag nodejs
kb info expressjs/express

# Search and discover
kb-search github "rate limiting nodejs" --stars ">500"
kb-search related expressjs/express

# Explore repositories
kb-explore clone expressjs/express
kb-explore analyze expressjs/express
kb-explore readme expressjs/express

# Track changes
kb-changes latest expressjs/express --detailed
kb-changes api-changes expressjs/express
kb-changes watch expressjs/express

# Check all watched repos
kb-changes updates

Features

1. Repository Management

  • Add repositories with full metadata (stars, language, topics)
  • Tag and categorize your collection
  • Track exploration status (bookmarked → exploring → explored)
  • Add personal notes about each repo
  • Persistent storage across sessions
  • Search GitHub for repositories by topic, stars, language
  • Find repositories related to ones you've added
  • Smart suggestions based on your collection

3. Code Exploration

  • Clone repositories for local analysis
  • Analyze structure and identify key files
  • Find entry points, tests, and documentation
  • Display README and docs without opening files
  • Show directory tree structure
  • Search for patterns across all your repos
  • Filter searches by tag or specific repo
  • Compare implementations between repos
  • Find examples of specific techniques

5. Change Tracking

  • Track latest releases and commits
  • Detect breaking changes automatically
  • Identify API changes and property renames (e.g., camelCase → snake_case)
  • Compare versions and show differences
  • Watch repositories for updates
  • Analyze changelogs for important changes

6. PDF Management (NEW!)

  • Smart Token Management - Estimates token cost before reading
  • Repository PDFs - Discover and index PDFs from cloned repos
  • Local Storage - Store PDFs in ~/.config/github-kb/notes/
  • Intelligent Summaries - Create summaries to save 80%+ tokens
  • Search & Organization - Tag and search PDFs by topic
  • Cost Transparency - Always shows token estimates before reading

📄 PDF Knowledge Base

Token-Aware PDF Management

One of the most important features of this skill is smart token management for PDFs. Reading large PDFs can consume significant tokens, so the skill provides transparency and tools to minimize waste.

Token Estimation Methodology

How we estimate tokens:

File Size (KB) × 150 = Estimated Tokens
Pages ≈ File Size / 50KB
Tokens per Page ≈ 500

Example estimates:
- 10-page paper (500 KB) → ~7,500 tokens
- 30-page guide (1.5 MB) → ~22,500 tokens
- 300-page book (15 MB) → ~225,000 tokens

Why this matters:
- Claude has a 200K token context window
- Reading a large book could consume your entire context!
- Summaries reduce token usage by 80-90%

Smart Token-Saving Features

1. Pre-Read Estimates

kb-pdf info research-paper.pdf

# Output shows:
# Estimated tokens: ~18,500
# ⚠️ Large PDF - summary recommended

2. Structured Summaries (NEW - Enhanced!)

kb-pdf summarize research-paper.pdf

# Creates intelligent template with:
# - Section-by-section breakdown
# - Topic index for quick lookup
# - Multiple reading strategies
# - Token estimates per section

# Result: ~2,000 token summary (saves ~16,500 tokens!)
# Plus: Can read individual sections (~500 tokens each)

3. Repository PDF Discovery

kb-pdf scan-repo facebook/react

# Shows all PDFs in repo with token estimates
# Lets you selectively add important ones

PDF Commands

# Add PDF from local file
kb-pdf add ~/Documents/react-internals.pdf --title "React Internals Guide" --tags react architecture

# Add PDF from cloned repository
kb-pdf scan-repo facebook/react  # Find PDFs
kb-pdf add ~/.config/github-kb/repos/facebook__react/docs/Architecture.pdf --source facebook/react

# Remove PDF from knowledge base
kb-pdf remove react-internals.pdf  # Original file not affected

# List all PDFs
kb-pdf list
kb-pdf list --tag architecture

# Get detailed info (with token estimate)
kb-pdf info react-internals.pdf

# Search PDFs
kb-pdf search "react"

# Create summary (token-saving!)
kb-pdf summarize react-internals.pdf

# Tag PDFs
kb-pdf tag react-internals.pdf frontend performance

🎯 Smart Book Detection (Avoid Token Waste!)

The Problem: Many popular technical books (Clean Code, Refactoring, Design Patterns, etc.) are already in Claude's training data. Reading the PDF of these books wastes thousands of tokens unnecessarily.

The Solution: The skill automatically detects when you're adding a known book and provides ready-to-use prompts instead.

Known Books Detection

When you try to add a known book, you'll get a warning (make sure to use a proper title):

kb-pdf add ~/Downloads/clean-code.pdf --title "Clean Code by Robert Martin"

# Output:
⚠️  TOKEN ALERT!

📚 Clean Code: A Handbook of Agile Software Craftsmanship
   Author: Robert C. Martin (2008)

✅ This book is already in Claude's training data!
   Reading the PDF will waste ~45,000 tokens.

💡 Instead, use these ready-to-use prompts:

   summary:
   "Summarize the main principles from Clean Code by Robert Martin"

   apply_to_code:
   "Review this code using Clean Code principles and suggest improvements"

   specific_topic:
   "Explain the [TOPIC] principles from Clean Code with examples"

💰 Token Savings: Skip the PDF and use the prompts above!

🔍 For more info: kb-books search "Clean Code"
📦 See combinations: kb-books combos

Known Books Commands

# List all known books
kb-books list

# Search for a book
kb-books search "clean code"
kb-books search "refactoring"

# Check if a book is known before adding PDF
kb-books check "Clean Code by Robert Martin"

# View curated combinations
kb-books combos

# Show combination details with prompts
kb-books combo clean-code-fundamentals
kb-books combo java-mastery
kb-books combo software-architecture

Currently Known Books (8 books)

Clean Code - Robert C. Martin (2008)
Refactoring - Martin Fowler (2018)
Design Patterns - Gang of Four (1994)
Clean Architecture - Robert C. Martin (2017)
Effective Java - Joshua Bloch (2017)
Effective Python - Brett Slatkin (2019)
The Pragmatic Programmer - Hunt & Thomas (2019)
Domain-Driven Design - Eric Evans (2003)

Curated Book Combinations

1. Clean Code Fundamentals (clean-code-fundamentals)
- Books: Clean Code, Refactoring, Design Patterns, Clean Architecture
- Use: "Review this code using Clean Code, Refactoring, Design Patterns, and Clean Architecture"

2. Java Best Practices (java-mastery)
- Books: Effective Java, Clean Code, Design Patterns
- Use: "Review this Java code using Effective Java, Clean Code, and Design Patterns principles"

3. Python Best Practices (python-mastery)
- Books: Effective Python, Clean Code, Design Patterns
- Use: "Review this Python code using Effective Python, Clean Code, and Design Patterns principles"

4. Software Architecture (software-architecture)
- Books: Clean Architecture, Design Patterns, Domain-Driven Design
- Use: "Design architecture for [SYSTEM] using Clean Architecture, Design Patterns, and DDD"

5. Software Craftsmanship (craftsmanship)
- Books: Pragmatic Programmer, Clean Code, Refactoring, Clean Architecture
- Use: "Evaluate my development approach using all four craftsmanship books"

Token Savings Example

# ❌ WITHOUT Detection (Wasteful)
kb-pdf add clean-code.pdf           # Adds PDF
Later: "Summarize Clean Code"       # Reads 45,000 tokens

# ✅ WITH Detection (Efficient)
kb-pdf add clean-code.pdf           # Warns + provides prompts
Instead: Use prompt directly        # 0 tokens, instant response!

Token Savings: 45,000 tokens (100%)

Token Usage Examples

Without Summaries (Token-Heavy):

You: "What do my PDFs say about React Fiber?"

Claude reads:
- react-fiber-architecture.pdf (20,000 tokens)
- react-reconciliation.pdf (15,000 tokens)
- react-performance.pdf (18,000 tokens)

Total: 53,000 tokens consumed! 💸

With Summaries (Token-Efficient):

You: "What do my PDFs say about React Fiber?"

Claude reads summaries:
- react-fiber-architecture.summary.md (1,500 tokens)
- react-reconciliation.summary.md (1,200 tokens)
- react-performance.summary.md (1,800 tokens)

Total: 4,500 tokens (saves 48,500 tokens!) ✅

Structured Summaries: Maximum Token Efficiency

The skill creates intelligent, structured summaries that maximize token savings and usability:

Summary Structure

# Structured Summary: React Fiber Architecture

**Total Pages**: 48 | **Full PDF**: ~18,500 tokens
**This Summary**: ~2,000 tokens | **Savings**: ~16,500 tokens (89%)

## 📋 Document Overview
- Main topic and key takeaway (~200 tokens)
- Quick decision: Is this document relevant?

## 🗂️ Document Structure

### Section 1: Introduction (Pages 1-8, ~3,200 tokens)
**Summary**: Introduces React Fiber as a rewrite...
**Key Concepts**: reconciliation, async rendering
**When to Read**: Understanding fundamentals

### Section 2: Architecture (Pages 9-25, ~6,800 tokens)
**Summary**: Describes Fiber node structure...
**Key Concepts**: work loop, priority queue
**When to Read**: Implementation details

### Section 3: Examples (Pages 26-48, ~8,500 tokens)
**Summary**: Practical code examples...
**When to Read**: Working code needed

## 🔍 Topic Index
| Topic | Sections | Pages | Why Important |
|-------|----------|-------|---------------|
| Reconciliation | 1, 2 | 5-20 | Core algorithm |
| Priority Queue | 2 | 15-18 | Scheduling |
| Code Examples | 3 | 30-45 | Implementation |

## 💡 Reading Strategies
- **Overview only**: Read Document Overview (~200 tokens)
- **Specific topic**: Use Topic Index + read 1 section (~800 tokens)
- **Full understanding**: Read entire summary (~2,000 tokens)
- **Deep dive**: Read full PDF (18,500 tokens)

Token Savings by Use Case

Your Need Read This Tokens vs Full PDF
"Is this relevant?" Overview ~200 99% saved
"Quick reference on X" Topic Index + 1 section ~800 96% saved
"General understanding" Full summary ~2,000 89% saved
"Deep research" Full PDF 18,500 0% saved

Long-Term ROI

Investment: Create summary once (18,500 tokens to read + summarize)

Returns: Every future use

# of Uses Without Summary With Summary Savings
1 18,500 18,500 0
2 37,000 20,500 16,500 (45%)
5 92,500 26,500 66,000 (71%)
10 185,000 36,500 148,500 (80%)
20 370,000 56,500 313,500 (85%)

Break-even: After just 2 uses!

Smart Section Reading

Instead of reading entire PDF, read only relevant sections:

User: "How does React Fiber handle scheduling?"

Strategy 1 (Wasteful):
→ Read full PDF: 18,500 tokens

Strategy 2 (Smart):
→ Check summary topic index: 0 tokens (already loaded)
→ Find "scheduling" in Section 2
→ Read Section 2 summary: ~300 tokens
→ If need more, read Section 2 only: ~6,800 tokens

Token savings: 11,700 tokens (63%)

Workflow: Adding PDFs from Repository

# 1. Clone a repository with documentation
kb add facebook/react
kb-explore clone facebook/react

# 2. Discover PDFs in the repo
kb-pdf scan-repo facebook/react

# Output:
# 📚 Found 3 PDF(s) in facebook/react
# 📄 docs/Architecture.pdf
#    Size: 2.4 MB | ~48 pages | ~18,500 tokens
# 📄 docs/Fiber.pdf
#    Size: 1.8 MB | ~36 pages | ~14,000 tokens

# 3. Selectively add important PDFs
kb-pdf add ~/.config/github-kb/repos/facebook__react/docs/Fiber.pdf \
  --source facebook/react \
  --tags react fiber architecture

# 4. Create summary to save tokens
kb-pdf summarize Fiber.pdf

# 5. Ask Claude to complete the summary
# "Please read ~/.config/github-kb/notes/Fiber.pdf and complete
#  the summary at ~/.config/github-kb/notes/Fiber.summary.md"

# 6. Future usage reads summary (saves ~12,000 tokens)

Computational Efficiency: Beyond Token Savings

Structured summaries don't just save tokens - they reduce unnecessary computation:

Traditional Approach (Wasteful)

User asks 5 questions about a PDF over 1 week:

Day 1: "What's this about?" → Read full PDF (18,500 tokens)
Day 2: "How does X work?" → Read full PDF again (18,500 tokens)
Day 3: "Where's the code?" → Read full PDF again (18,500 tokens)
Day 4: "What about Y?" → Read full PDF again (18,500 tokens)
Day 5: "Summary please" → Read full PDF again (18,500 tokens)

Total: 92,500 tokens
Computational waste: Read same content 5 times
Environmental impact: 5× the energy consumption

Structured Summary Approach (Efficient)

Day 0: Create summary once (18,500 tokens)

Day 1: "What's this about?" → Read overview (200 tokens)
Day 2: "How does X work?" → Read relevant section summary (300 tokens)
Day 3: "Where's the code?" → Check topic index → Section 3 (400 tokens)
Day 4: "What about Y?" → Read Section 2 summary (300 tokens)
Day 5: "Summary please" → Already have it! (0 tokens, point to summary)

Total: 19,700 tokens
Savings: 72,800 tokens (79%)
Computation reduction: Read full content once, reuse compressed version
Environmental benefit: 79% less energy per use

Benefits

1. Token Efficiency
- 79-89% reduction in token usage
- Lower API costs
- More conversation capacity

2. Computational Efficiency
- Process document once, not repeatedly
- Faster responses (reading summary vs full PDF)
- Reduced server load

3. Environmental Impact
- Less computation = less energy
- Sustainable AI usage
- Responsible resource management

4. Better User Experience
- Quick answers from summaries
- Option to deep-dive when needed
- Clear token costs upfront

Best Practices for Token Efficiency

1. Always Check Estimates First

kb-pdf info document.pdf  # See token cost before reading

2. Create Summaries for Large PDFs

# If PDF > 10,000 tokens, create summary
kb-pdf summarize large-document.pdf

3. Use Tags to Find PDFs Without Reading

kb-pdf search "architecture"  # Find relevant PDFs
kb-pdf list --tag react       # Filter by topic

4. Selective Reading

# Good: Read specific PDF
"Read the Fiber architecture PDF"

# Wasteful: Read all PDFs
"Read all my React PDFs"  # Could be 100K+ tokens!

Data Storage

All data is stored in ~/.config/github-kb/:

~/.config/github-kb/
├── index.json              # Registry of all repositories
├── repos/                  # Cloned repositories
├── notes/                  # Your notes about repos
│   ├── *.pdf              # Stored PDFs
│   ├── *.summary.md       # PDF summaries (token-efficient!)
│   ├── pdf_index.json     # PDF metadata and token estimates
│   └── owner__repo.md     # Repository notes
├── changes/                # Change tracking data for watched repos
└── cache/                  # Cached data

This persists across Claude Code sessions.

Scripts

The skill includes five Python scripts:

  • kb.py - Repository management (add, list, tag, note)
  • kb_search.py - Search GitHub and your KB
  • kb_explore.py - Clone and explore repositories
  • kb_changes.py - Track changes, releases, and API modifications
  • kb_pdf.py - PDF management with smart token optimization

You can run these directly or let Claude handle them conversationally.

Without a token: 60 API requests/hour
With a token: 5,000 API requests/hour

Setup

  1. Create token at https://github.com/settings/tokens
  2. Select scope: public_repo
  3. Export in your shell:
export GITHUB_TOKEN=ghp_your_token_here

Add to ~/.bashrc or ~/.zshrc to persist.

Example Workflows

Learning a New Framework

You: "I want to learn GraphQL. Find me the best repos."
Claude: [Searches, suggests graphql-js, apollo-server, etc.]

You: "Add the top 3"
Claude: [Adds them, tags as 'graphql', 'learning']

You: "Explore the graphql-js repo"
Claude: [Clones, analyzes structure, shows key files]

You: "How does it handle validation?"
Claude: [Searches code, shows examples]

Solving a Problem

You: "I need to add rate limiting to my API. Find solutions."
Claude: [Searches GitHub for rate limiting libraries]

You: "Add express-rate-limit"
Claude: [Adds, shows info]

You: "Clone and show me usage examples"
Claude: [Clones, finds tests and examples]

You: "Compare it with node-rate-limiter-flexible"
Claude: [Adds second lib, compares approaches]

Building Expertise

You: "Show my backend repos"
Claude: [Lists all repos tagged 'backend']

You: "How do they handle authentication?"
Claude: [Searches for auth patterns across repos, shows examples]

You: "Compare Express and Fastify middleware"
Claude: [Shows side-by-side comparison]

Commands Reference

Note: After running ./install-commands.sh, you can use short commands like kb add instead of python kb.py add.

KB Management

# Add repository
kb add facebook/react

# List all repos
kb list

# List by tag
kb list --tag frontend

# Tag a repo
kb tag facebook/react frontend ui library

# Add notes
kb note facebook/react "Great hooks implementation"

# Set status
kb status facebook/react explored

# Show details
kb info facebook/react

# Show statistics
kb stats

# Remove repo
kb remove facebook/react
Using Python directly (without install-commands.sh)
python kb.py add facebook/react
python kb.py list
python kb.py tag facebook/react frontend ui library
# etc...

Search & Discovery

# Search GitHub
kb-search github "react state" --stars ">1000"

# Find related repos
kb-search related facebook/react

# Search your KB
kb-search code "useEffect" --tag frontend

# Compare repos
kb-search compare express fastify "middleware"

Exploration

# Clone repository
kb-explore clone facebook/react

# Sync (pull updates)
kb-explore sync facebook/react

# Analyze structure
kb-explore analyze facebook/react

# Show tree
kb-explore tree facebook/react --depth 2

# View README
kb-explore readme facebook/react

# Find docs
kb-explore docs facebook/react

# Find entry points
kb-explore entry-points facebook/react

# Find tests
kb-explore find-tests facebook/react

Change Tracking

# Show latest changes
kb-changes latest facebook/react
kb-changes latest facebook/react --detailed

# View changelog
kb-changes changelog facebook/react

# Track API changes (detects property renames)
kb-changes api-changes facebook/react
kb-changes api-changes facebook/react --pattern "*.ts"

# Compare versions
kb-changes compare facebook/react v17.0.0 v18.0.0

# Watch for updates
kb-changes watch facebook/react

# Check watched repos
kb-changes updates

PDF Management

# Add PDF to knowledge base
kb-pdf add ~/Documents/paper.pdf --title "Research Paper" --tags ml research

# Remove PDF from knowledge base
kb-pdf remove paper.pdf

# List PDFs (with token estimates)
kb-pdf list
kb-pdf list --tag architecture

# Get PDF info (shows token cost)
kb-pdf info paper.pdf

# Find PDFs in cloned repository
kb-pdf scan-repo facebook/react

# Create summary (saves 80%+ tokens!)
kb-pdf summarize paper.pdf

# Search PDFs
kb-pdf search "machine learning"

# Tag PDFs
kb-pdf tag paper.pdf ai transformers

Tips

  1. Start with search - Find repos before adding
  2. Tag consistently - Use standard tags (frontend, backend, auth, etc.)
  3. Clone selectively - Only clone what you'll explore
  4. Use shallow clones - Add --depth 1 for large repos
  5. Document as you learn - Add notes immediately
  6. Track progress - Update status as you explore
  7. Regular reviews - Check your KB weekly

Advanced Usage

Find repositories for a topic

You: "Find the best TypeScript testing libraries"

Build a learning path

You: "I want to learn microservices. Create a collection."
Claude: [Finds and adds relevant repos, tags them, organizes by difficulty]

Research best practices

You: "How do popular repos handle configuration?"
Claude: [Searches across your KB, shows different approaches]

Architecture study

You: "Compare monorepo vs single-repo structures"
Claude: [Finds examples, analyzes structure, compares approaches]

Track breaking changes

You: "What's new in React 18? Any breaking changes?"
Claude: [Shows latest release with detailed analysis]
       "React 18 introduced:
        - Breaking: Automatic batching for all updates
        - API Change: createRoot replaces render
        - New: Concurrent features and Suspense"

Monitor API changes

You: "Track API changes in the Anthropic SDK"
Claude: [Runs api-changes detection]
       "Detected changes:
        - Property renamed: maxTokens → max_tokens
        - Property renamed: stopSequences → stop_sequences
        - Function signature changed: create() now requires model param"

Compare versions

You: "What changed between Next.js 13 and 14?"
Claude: [Compares versions, shows commits and file changes]
       "196 commits, major changes in:
        - app/ directory improvements
        - Server Actions stability
        - Turbopack updates"

Troubleshooting

"Knowledge base is empty"
→ Add repositories: python kb.py add owner/repo

"Repository not cloned yet"
→ Clone it: python kb_explore.py clone owner/repo

"API rate limit exceeded"
→ Set GITHUB_TOKEN environment variable

"No matches found"
→ Ensure repo is cloned, try broader search pattern

Requirements

  • Python 3.7+
  • Git
  • Internet connection (for GitHub API)
  • Optional: ripgrep (rg) for faster code search

No Python packages required - uses only standard library!

File Structure

github-knowledge-base/
├── SKILL.md              # Main skill instructions for Claude
├── README.md             # This file
├── SETUP.md              # Installation and setup guide
├── scripts/
│   ├── kb.py            # Repository management
│   ├── kb_search.py     # Search and discovery
│   ├── kb_explore.py    # Repository exploration
│   └── kb_changes.py    # Change tracking and analysis
└── references/
    ├── workflows.md     # Detailed workflow examples
    └── github-api.md    # GitHub API reference

Contributing Ideas

This skill can be extended with:
- Dependency graph visualization
- Automated summarization of repos
- Export to different formats
- Integration with note-taking apps
- Collaborative knowledge bases
- ML-based recommendations

License

This skill is part of Claude Code and follows its licensing.

Support

For issues or questions:
- Check the references/ directory for detailed docs
- Ask Claude Code for help with specific tasks
- Refer to workflow examples in references/workflows.md


Happy exploring! Build your knowledge, one repository at a time.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.