199-biotechnologies

deep-research

34
5
# Install this skill:
npx skills add 199-biotechnologies/claude-deep-research-skill

Or install specific skill: npx add-skill https://github.com/199-biotechnologies/claude-deep-research-skill

# Description

Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.

# SKILL.md


name: deep-research
description: Conduct enterprise-grade research with multi-source synthesis, citation tracking, and verification. Use when user needs comprehensive analysis requiring 10+ sources, verified claims, or comparison of approaches. Triggers include "deep research", "comprehensive analysis", "research report", "compare X vs Y", or "analyze trends". Do NOT use for simple lookups, debugging, or questions answerable with 1-2 searches.


Deep Research

Core System Instructions

Purpose: Deliver citation-backed, verified research reports through 8-phase pipeline (Scope → Plan → Retrieve → Triangulate → Synthesize → Critique → Refine → Package) with source credibility scoring and progressive context management.

Context Strategy: This skill uses 2025 context engineering best practices:
- Static instructions cached (this section)
- Progressive disclosure (load references only when needed)
- Avoid "loss in the middle" (critical info at start/end, not buried)
- Explicit section markers for context navigation


Decision Tree (Execute First)

Request Analysis
├─ Simple lookup? → STOP: Use WebSearch, not this skill
├─ Debugging? → STOP: Use standard tools, not this skill
└─ Complex analysis needed? → CONTINUE

Mode Selection
├─ Initial exploration? → quick (3 phases, 2-5 min)
├─ Standard research? → standard (6 phases, 5-10 min) [DEFAULT]
├─ Critical decision? → deep (8 phases, 10-20 min)
└─ Comprehensive review? → ultradeep (8+ phases, 20-45 min)

Execution Loop (per phase)
├─ Load phase instructions from [methodology](./reference/methodology.md#phase-N)
├─ Execute phase tasks
├─ Spawn parallel agents if applicable
└─ Update progress

Validation Gate
├─ Run `python scripts/validate_report.py --report [path]`
├─ Pass? → Deliver
└─ Fail? → Fix (max 2 attempts) → Still fails? → Escalate

Workflow (Clarify → Plan → Act → Verify → Report)

AUTONOMY PRINCIPLE: This skill operates independently. Infer assumptions from query context. Only stop for critical errors or incomprehensible queries.

1. Clarify (Rarely Needed - Prefer Autonomy)

DEFAULT: Proceed autonomously. Derive assumptions from query signals.

ONLY ask if CRITICALLY ambiguous:
- Query is incomprehensible (e.g., "research the thing")
- Contradictory requirements (e.g., "quick 50-source ultradeep analysis")

When in doubt: PROCEED with standard mode. User will redirect if incorrect.

Default assumptions:
- Technical query → Assume technical audience
- Comparison query → Assume balanced perspective needed
- Trend query → Assume recent 1-2 years unless specified
- Standard mode is default for most queries


2. Plan

Mode selection criteria:
- Quick (2-5 min): Exploration, broad overview, time-sensitive
- Standard (5-10 min): Most use cases, balanced depth/speed [DEFAULT]
- Deep (10-20 min): Important decisions, need thorough verification
- UltraDeep (20-45 min): Critical analysis, maximum rigor

Announce plan and execute:
- Briefly state: selected mode, estimated time, number of sources
- Example: "Starting standard mode research (5-10 min, 15-30 sources)"
- Proceed without waiting for approval


3. Act (Phase Execution)

All modes execute:
- Phase 1: SCOPE - Define boundaries (method)
- Phase 3: RETRIEVE - Parallel search execution (5-10 concurrent searches + agents) (method)
- Phase 8: PACKAGE - Generate report using template

Standard/Deep/UltraDeep execute:
- Phase 2: PLAN - Strategy formulation
- Phase 4: TRIANGULATE - Verify 3+ sources per claim
- Phase 4.5: OUTLINE REFINEMENT - Adapt structure based on evidence (WebWeaver 2025) (method)
- Phase 5: SYNTHESIZE - Generate novel insights

Deep/UltraDeep execute:
- Phase 6: CRITIQUE - Red-team analysis
- Phase 7: REFINE - Address gaps

Critical: Avoid "Loss in the Middle"
- Place key findings at START and END of sections, not buried
- Use explicit headers and markers
- Structure: Summary → Details → Conclusion (not Details sandwiched)

Progressive Context Loading:
- Load methodology sections on-demand
- Load template only for Phase 8
- Do not inline everything - reference external files

Anti-Hallucination Protocol (CRITICAL):
- Source grounding: Every factual claim MUST cite a specific source immediately [N]
- Clear boundaries: Distinguish between FACTS (from sources) and SYNTHESIS (your analysis)
- Explicit markers: Use "According to [1]..." or "[1] reports..." for source-grounded statements
- No speculation without labeling: Mark inferences as "This suggests..." not "Research shows..."
- Verify before citing: If unsure whether source actually says X, do NOT fabricate citation
- When uncertain: Say "No sources found for X" rather than inventing references

Parallel Execution Requirements (CRITICAL for Speed):

Phase 3 RETRIEVE - Mandatory Parallel Search:
1. Decompose query into 5-10 independent search angles before ANY searches
2. Launch ALL searches in single message with multiple tool calls (NOT sequential)
3. Quality threshold monitoring for FFS pattern:
- Track source count and avg credibility score
- Proceed when threshold reached (mode-specific, see methodology)
- Continue background searches for additional depth
4. Spawn 3-5 parallel agents using Task tool for deep-dive investigations

Example correct execution:

[Single message with 8+ parallel tool calls]
WebSearch #1: Core topic semantic
WebSearch #2: Technical keywords
WebSearch #3: Recent 2024-2025 filtered
WebSearch #4: Academic domains
WebSearch #5: Critical analysis
WebSearch #6: Industry trends
Task agent #1: Academic paper analysis
Task agent #2: Technical documentation deep dive

❌ WRONG (sequential execution):

WebSearch #1 → wait for results → WebSearch #2 → wait → WebSearch #3...

✅ RIGHT (parallel execution):

All searches + agents launched simultaneously in one message

4. Verify (Always Execute)

Step 1: Citation Verification (Catches Fabricated Sources)

python scripts/verify_citations.py --report [path]

Checks:
- DOI resolution (verifies citation actually exists)
- Title/year matching (detects mismatched metadata)
- Flags suspicious entries (2024+ without DOI, no URL, failed verification)

If suspicious citations found:
- Review flagged entries manually
- Remove or replace fabricated sources
- Re-run until clean

Step 2: Structure & Quality Validation

python scripts/validate_report.py --report [path]

8 automated checks:
1. Executive summary length (50-250 words)
2. Required sections present (+ recommended: Claims table, Counterevidence)
3. Citations formatted [1], [2], [3]
4. Bibliography matches citations
5. No placeholder text (TBD, TODO)
6. Word count reasonable (500-10000)
7. Minimum 10 sources
8. No broken internal links

If fails:
- Attempt 1: Auto-fix formatting/links
- Attempt 2: Manual review + correction
- After 2 failures: STOP → Report issues → Ask user


5. Report

CRITICAL: Generate COMPREHENSIVE, DETAILED markdown reports

File Organization (CRITICAL - Clean Accessibility):

1. Create Organized Folder in Documents:
- ALWAYS create dedicated folder: ~/Documents/[TopicName]_Research_[YYYYMMDD]/
- Extract clean topic name from research question (remove special chars, use underscores/CamelCase)
- Examples:
- "psilocybin research 2025" → ~/Documents/Psilocybin_Research_20251104/
- "compare React vs Vue" → ~/Documents/React_vs_Vue_Research_20251104/
- "AI safety trends" → ~/Documents/AI_Safety_Trends_Research_20251104/
- If folder exists, use it; if not, create it
- This ensures clean organization and easy accessibility

2. Save All Formats to Same Folder:

Markdown (Primary Source):
- Save to: [Documents folder]/research_report_[YYYYMMDD]_[topic_slug].md
- Also save copy to: ~/.claude/research_output/ (internal tracking)
- Full detailed report with all findings

HTML (McKinsey Style - ALWAYS GENERATE):
- Save to: [Documents folder]/research_report_[YYYYMMDD]_[topic_slug].html
- Use McKinsey template: mckinsey_template
- Design principles: Sharp corners (NO border-radius), muted corporate colors (navy #003d5c, gray #f8f9fa), ultra-compact layout, info-first structure
- Place critical metrics dashboard at top (extract 3-4 key quantitative findings)
- Use data tables for dense information presentation
- 14px base font, compact spacing, no decorative gradients or colors
- Attribution Gradients (2025): Wrap each citation [N] in <span class="citation"> with nested tooltip div showing source details
- OPEN in browser automatically after generation

PDF (Professional Print - ALWAYS GENERATE):
- Save to: [Documents folder]/research_report_[YYYYMMDD]_[topic_slug].pdf
- Use generating-pdf skill (via Task tool with general-purpose agent)
- Professional formatting with headers, page numbers
- OPEN in default PDF viewer after generation

3. File Naming Convention:
All files use same base name for easy matching:
- research_report_20251104_psilocybin_2025.md
- research_report_20251104_psilocybin_2025.html
- research_report_20251104_psilocybin_2025.pdf

Length Requirements (UNLIMITED with Progressive Assembly):
- Quick mode: 2,000+ words (baseline quality threshold)
- Standard mode: 4,000+ words (comprehensive analysis)
- Deep mode: 6,000+ words (thorough investigation)
- UltraDeep mode: 10,000-50,000+ words (NO UPPER LIMIT - as comprehensive as evidence warrants)

How Unlimited Length Works:
Progressive file assembly allows ANY report length by generating section-by-section.
Each section is written to file immediately (avoiding output token limits).
Complex topics with many findings? Generate 20, 30, 50+ findings - no constraint!

Content Requirements:
- Use template as exact structure
- Generate each section to APPROPRIATE depth (determined by evidence, not word targets)
- Include specific data, statistics, dates, numbers (not vague statements)
- Multiple paragraphs per finding with evidence (as many as needed)
- Each section gets focused generation attention
- DO NOT write summaries - write FULL analysis

Writing Standards:
- Narrative-driven: Write in flowing prose. Each finding tells a story with beginning (context), middle (evidence), end (implications)
- Precision: Every word deliberately chosen, carries intention
- Economy: No fluff, eliminate fancy grammar, unnecessary modifiers
- Clarity: Exact numbers embedded in sentences ("The study demonstrated a 23% reduction in mortality"), not isolated in bullets
- Directness: State findings without embellishment
- High signal-to-noise: Dense information, respect reader's time

Bullet Point Policy (Anti-Fatigue Enforcement):
- Use bullets SPARINGLY: Only for distinct lists (product names, company roster, enumerated steps)
- NEVER use bullets as primary content delivery - they fragment thinking
- Each findings section requires substantive prose paragraphs (3-5+ paragraphs minimum)
- Example: Instead of "• Market size: $2.4B" write "The global market reached $2.4 billion in 2023, driven by increasing consumer demand and regulatory tailwinds [1]."

Anti-Fatigue Quality Check (Apply to EVERY Section):
Before considering a section complete, verify:
- [ ] Paragraph count: ≥3 paragraphs for major sections (## headings)
- [ ] Prose-first: <20% of content is bullet points (≥80% must be flowing prose)
- [ ] No placeholders: Zero instances of "Content continues", "Due to length", "[Sections X-Y]"
- [ ] Evidence-rich: Specific data points, statistics, quotes (not vague statements)
- [ ] Citation density: Major claims cited within same sentence

If ANY check fails: Regenerate the section before moving to next.

Source Attribution Standards (Critical for Preventing Fabrication):
- Immediate citation: Every factual claim followed by [N] citation in same sentence
- Quote sources directly: Use "According to [1]..." or "[1] reports..." for factual statements
- Distinguish fact from synthesis:
- ✅ GOOD: "Mortality decreased 23% (p<0.01) in the treatment group [1]."
- ❌ BAD: "Studies show mortality improved significantly."
- No vague attributions:
- ❌ NEVER: "Research suggests...", "Studies show...", "Experts believe..."
- ✅ ALWAYS: "Smith et al. (2024) found..." [1], "According to FDA data..." [2]
- Label speculation explicitly:
- ✅ GOOD: "This suggests a potential mechanism..." (analysis, not fact)
- ❌ BAD: "The mechanism is..." (presented as fact without citation)
- Admit uncertainty:
- ✅ GOOD: "No sources found addressing X directly."
- ❌ BAD: Fabricating a citation to fill the gap
- Template pattern: "[Specific claim with numbers/data] [Citation]. [Analysis/implication]."

Deliver to user:
1. Executive summary (inline in chat)
2. Organized folder path (e.g., "All files saved to: ~/Documents/Psilocybin_Research_20251104/")
3. Confirmation of all three formats generated:
- Markdown (source)
- HTML (McKinsey-style, opened in browser)
- PDF (professional print, opened in viewer)
4. Source quality assessment summary (source count)
5. Next steps (if relevant)

Generation Workflow: Progressive File Assembly (Unlimited Length)

Phase 8.1: Setup

# Extract topic slug from research question
# Create folder: ~/Documents/[TopicName]_Research_[YYYYMMDD]/
mkdir -p ~/Documents/[folder_name]

# Create initial markdown file with frontmatter
# File path: [folder]/research_report_[YYYYMMDD]_[slug].md

Phase 8.2: Progressive Section Generation

CRITICAL STRATEGY: Generate and write each section individually to file using Write/Edit tools.
This allows unlimited report length while keeping each generation manageable.

OUTPUT TOKEN LIMIT SAFEGUARD (CRITICAL - Claude Code Default: 32K):

Claude Code default limit: 32,000 output tokens (≈24,000 words total per skill execution)
This is a HARD LIMIT and cannot be changed within the skill.

What this means:
- Total output (your text + all tool call content) must be <32,000 tokens
- 32,000 tokens ≈ 24,000 words max
- Leave safety margin: Target ≤20,000 words total output

Realistic report sizes per mode:
- Quick mode: 2,000-4,000 words ✅ (well under limit)
- Standard mode: 4,000-8,000 words ✅ (comfortably under limit)
- Deep mode: 8,000-15,000 words ✅ (achievable with care)
- UltraDeep mode: 15,000-20,000 words ⚠️ (at limit, monitor closely)

For reports >20,000 words:
User must run skill multiple times:
- Run 1: "Generate Part 1 (sections 1-6)" → saves to part1.md
- Run 2: "Generate Part 2 (sections 7-12)" → saves to part2.md
- User manually combines or asks Claude to merge files

Auto-Continuation Strategy (TRUE Unlimited Length):

When report exceeds 18,000 words in single run:
1. Generate sections 1-10 (stay under 18K words)
2. Save continuation state file with context preservation
3. Spawn continuation agent via Task tool
4. Continuation agent: Reads state → Generates next batch → Spawns next agent if needed
5. Chain continues recursively until complete

This achieves UNLIMITED length while respecting 32K limit per agent

Initialize Citation Tracking:

citations_used = []  # Maintain this list in working memory throughout

Section Generation Loop:

Pattern: Generate section content → Use Write/Edit tool with that content → Move to next section
Each Write/Edit call contains ONE section (≤2,000 words per call)

  1. Executive Summary (200-400 words)
  2. Generate section content
  3. Tool: Write(file, content=frontmatter + Executive Summary)
  4. Track citations used
  5. Progress: "✓ Executive Summary"

  6. Introduction (400-800 words)

  7. Generate section content
  8. Tool: Edit(file, old=last_line, new=old + Introduction section)
  9. Track citations used
  10. Progress: "✓ Introduction"

  11. Finding 1 (600-2,000 words)

  12. Generate complete finding
  13. Tool: Edit(file, append Finding 1)
  14. Track citations used
  15. Progress: "✓ Finding 1"

  16. Finding 2 (600-2,000 words)

  17. Generate complete finding
  18. Tool: Edit(file, append Finding 2)
  19. Track citations used
  20. Progress: "✓ Finding 2"

... Continue for ALL findings (each finding = one Edit tool call, ≤2,000 words)

CRITICAL: If you have 10 findings × 1,500 words each = 15,000 words of findings
This is OKAY because each Edit call is only 1,500 words (under 2,000 word limit per tool call)
The FILE grows to 15,000 words, but no single tool call exceeds limits

  1. Synthesis & Insights
  2. Generate: Novel insights beyond source statements (as long as needed for synthesis)
  3. Tool: Edit (append to file)
  4. Track: Extract citations, append to citations_used
  5. Progress: "Generated Synthesis ✓"

  6. Limitations & Caveats

  7. Generate: Counterevidence, gaps, uncertainties (appropriate depth)
  8. Tool: Edit (append to file)
  9. Track: Extract citations, append to citations_used
  10. Progress: "Generated Limitations ✓"

  11. Recommendations

  12. Generate: Immediate actions, next steps, research needs (appropriate depth)
  13. Tool: Edit (append to file)
  14. Track: Extract citations, append to citations_used
  15. Progress: "Generated Recommendations ✓"

  16. Bibliography (CRITICAL - ALL Citations)

  17. Generate: COMPLETE bibliography with EVERY citation from citations_used list
  18. Format: [1], [2], [3]... [N] - each citation gets full entry
  19. Verification: Check citations_used list - if list contains [1] through [73], generate all 73 entries
  20. NO ranges ([1-50]), NO placeholders ("Additional citations"), NO truncation
  21. Tool: Edit (append to file)
  22. Progress: "Generated Bibliography ✓ (N citations)"

  23. Methodology Appendix

  24. Generate: Research process, verification approach (appropriate depth)
  25. Tool: Edit (append to file)
  26. Progress: "Generated Methodology ✓"

Phase 8.3: Auto-Continuation Decision Point

After generating sections, check word count:

If total output ≤18,000 words: Complete normally
- Generate Bibliography (all citations)
- Generate Methodology
- Verify complete report
- Save copy to ~/.claude/research_output/
- Done! ✓

If total output will exceed 18,000 words: Auto-Continuation Protocol

Step 1: Save Continuation State
Create file: ~/.claude/research_output/continuation_state_[report_id].json

{
  "version": "2.1.1",
  "report_id": "[unique_id]",
  "file_path": "[absolute_path_to_report.md]",
  "mode": "[quick|standard|deep|ultradeep]",

  "progress": {
    "sections_completed": [list of section IDs done],
    "total_planned_sections": [total count],
    "word_count_so_far": [current word count],
    "continuation_count": [which continuation this is, starts at 1]
  },

  "citations": {
    "used": [1, 2, 3, ..., N],
    "next_number": [N+1],
    "bibliography_entries": [
      "[1] Full citation entry",
      "[2] Full citation entry",
      ...
    ]
  },

  "research_context": {
    "research_question": "[original question]",
    "key_themes": ["theme1", "theme2", "theme3"],
    "main_findings_summary": [
      "Finding 1: [100-word summary]",
      "Finding 2: [100-word summary]",
      ...
    ],
    "narrative_arc": "[Current position in story: beginning/middle/conclusion]"
  },

  "quality_metrics": {
    "avg_words_per_finding": [calculated average],
    "citation_density": [citations per 1000 words],
    "prose_vs_bullets_ratio": [e.g., "85% prose"],
    "writing_style": "technical-precise-data-driven"
  },

  "next_sections": [
    {"id": N, "type": "finding", "title": "Finding X", "target_words": 1500},
    {"id": N+1, "type": "synthesis", "title": "Synthesis", "target_words": 1000},
    ...
  ]
}

Step 2: Spawn Continuation Agent

Use Task tool with general-purpose agent:

Task(
  subagent_type="general-purpose",
  description="Continue deep-research report generation",
  prompt="""
CONTINUATION TASK: You are continuing an existing deep-research report.

CRITICAL INSTRUCTIONS:
1. Read continuation state file: ~/.claude/research_output/continuation_state_[report_id].json
2. Read existing report to understand context: [file_path from state]
3. Read LAST 3 completed sections to understand flow and style
4. Load research context: themes, narrative arc, writing style from state
5. Continue citation numbering from state.citations.next_number
6. Maintain quality metrics from state (avg words, citation density, prose ratio)

CONTEXT PRESERVATION:
- Research question: [from state]
- Key themes established: [from state]
- Findings so far: [summaries from state]
- Narrative position: [from state]
- Writing style: [from state]

YOUR TASK:
Generate next batch of sections (stay under 18,000 words):
[List next_sections from state]

Use Write/Edit tools to append to existing file: [file_path]

QUALITY GATES (verify before each section):
- Words per section: Within ±20% of [avg_words_per_finding]
- Citation density: Match [citation_density] ±0.5 per 1K words
- Prose ratio: Maintain ≥80% prose (not bullets)
- Theme alignment: Section ties to key_themes
- Style consistency: Match [writing_style]

After generating sections:
- If more sections remain: Update state, spawn next continuation agent
- If final sections: Generate complete bibliography, verify report, cleanup state file

HANDOFF PROTOCOL (if spawning next agent):
1. Update continuation_state.json with new progress
2. Add new citations to state
3. Add summaries of new findings to state
4. Update quality metrics
5. Spawn next agent with same instructions
"""
)

Step 3: Report Continuation Status
Tell user:

📊 Report Generation: Part 1 Complete (N sections, X words)
🔄 Auto-continuing via spawned agent...
   Next batch: [section list]
   Progress: [X%] complete

Phase 8.4: Continuation Agent Quality Protocol

When continuation agent starts:

Context Loading (CRITICAL):
1. Read continuation_state.json → Load ALL context
2. Read existing report file → Review last 3 sections
3. Extract patterns:
- Sentence structure complexity
- Technical terminology used
- Citation placement patterns
- Paragraph transition style

Pre-Generation Checklist:
- [ ] Loaded research context (themes, question, narrative arc)
- [ ] Reviewed previous sections for flow
- [ ] Loaded citation numbering (start from N+1)
- [ ] Loaded quality targets (words, density, style)
- [ ] Understand where in narrative arc (beginning/middle/end)

Per-Section Generation:
1. Generate section content
2. Quality checks:
- Word count: Within target ±20%
- Citation density: Matches established rate
- Prose ratio: ≥80% prose
- Theme connection: Ties to key_themes
- Style match: Consistent with quality_metrics.writing_style
3. If ANY check fails: Regenerate section
4. If passes: Write to file, update state

Handoff Decision:
- Calculate: Current word count + remaining sections × avg_words_per_section
- If total < 18K: Generate all remaining sections + finish
- If total > 18K: Generate partial batch, update state, spawn next agent

Final Agent Responsibilities:
- Generate final content sections
- Generate COMPLETE bibliography using ALL citations from state.citations.bibliography_entries
- Read entire assembled report
- Run validation: python scripts/validate_report.py --report [path]
- Delete continuation_state.json (cleanup)
- Report complete to user with metrics

Anti-Fatigue Built-In:
Each agent generates manageable chunks (≤18K words), maintaining quality.
Context preservation ensures coherence across continuation boundaries.

Generate HTML (McKinsey Style)
1. Read McKinsey template from ./templates/mckinsey_report_template.html
2. Extract 3-4 key quantitative metrics from findings for dashboard
3. Use Python script for MD to HTML conversion:

bash cd ~/.claude/skills/deep-research python scripts/md_to_html.py [markdown_report_path]

The script returns two parts:
- Part A ({{CONTENT}}): All sections except Bibliography, properly converted to HTML
- Part B ({{BIBLIOGRAPHY}}): Bibliography section only, formatted as HTML

CRITICAL: The script handles ALL conversion automatically:
- Headers: ## → <div class="section"><h2 class="section-title">, ### → <h3 class="subsection-title">
- Lists: Markdown bullets → <ul><li> with proper nesting
- Tables: Markdown tables → <table> with thead/tbody
- Paragraphs: Text wrapped in <p> tags
- Bold/italic: text<strong>, text<em>
- Citations: [N] preserved for tooltip conversion in step 4

  1. Add Citation Tooltips (Attribution Gradients):
    For each [N] citation in {{CONTENT}} (not bibliography), optionally add interactive tooltips:
    html <span class="citation">[N] <span class="citation-tooltip"> <div class="tooltip-title">[Source Title]</div> <div class="tooltip-source">[Author/Publisher]</div> <div class="tooltip-claim"> <div class="tooltip-claim-label">Supports Claim:</div> [Extract sentence with this citation] </div> </span> </span>
    NOTE: This step is optional for speed. Basic [N] citations are sufficient.

  2. Replace placeholders in template:

  3. {{TITLE}} - Report title (extract from first ## heading in MD)
  4. {{DATE}} - Generation date (YYYY-MM-DD format)
  5. {{SOURCE_COUNT}} - Number of unique sources
  6. {{METRICS_DASHBOARD}} - Metrics HTML from step 2
  7. {{CONTENT}} - HTML from Part A (script output)
  8. {{BIBLIOGRAPHY}} - HTML from Part B (script output)

  9. CRITICAL: NO EMOJIS - Remove any emoji characters from final HTML

  10. Save to: [folder]/research_report_[YYYYMMDD]_[slug].html

  11. Verify HTML (MANDATORY):
    bash python scripts/verify_html.py --html [html_path] --md [md_path]

  12. Check passes: Proceed to step 9
  13. Check fails: Fix errors and re-run verification

  14. Open in browser: open [html_path]

Generate PDF
1. Use Task tool with general-purpose agent
2. Invoke generating-pdf skill with markdown as input
3. Save to: [folder]/research_report_[YYYYMMDD]_[slug].pdf
4. PDF will auto-open when complete


Output Contract

Format: Comprehensive markdown report following template EXACTLY

Required sections (all must be detailed):
- Executive Summary (2-3 concise paragraphs, 50-250 words)
- Introduction (2-3 paragraphs: question, scope, methodology, assumptions)
- Main Analysis (4-8 findings, each 300-500 words with citations [1], [2], [3])
- Synthesis & Insights (500-1000 words: patterns, novel insights, implications)
- Limitations & Caveats (2-3 paragraphs: gaps, assumptions, uncertainties)
- Recommendations (3-5 immediate actions, 3-5 next steps, 3-5 further research)
- Bibliography (CRITICAL - see rules below)
- Methodology Appendix (2-3 paragraphs: process, sources, verification)

Bibliography Requirements (ZERO TOLERANCE - Report is UNUSABLE without complete bibliography):
- ✅ MUST include EVERY citation [N] used in report body (if report has [1]-[50], write all 50 entries)
- ✅ Format: [N] Author/Org (Year). "Title". Publication. URL (Retrieved: Date)
- ✅ Each entry on its own line, complete with all metadata
- ❌ NO placeholders: NEVER use "[8-75] Additional citations", "...continue...", "etc.", "[Continue with sources...]"
- ❌ NO ranges: Write [3], [4], [5]... individually, NOT "[3-50]"
- ❌ NO truncation: If 30 sources cited, write all 30 entries in full
- ⚠️ Validation WILL FAIL if bibliography contains placeholders or missing citations
- ⚠️ Report is GARBAGE without complete bibliography - no way to verify claims

Strictly Prohibited:
- Placeholder text (TBD, TODO, [citation needed])
- Uncited major claims
- Broken links
- Missing required sections
- Short summaries instead of detailed analysis
- Vague statements without specific evidence

Writing Standards (Critical):
- Narrative-driven: Write in flowing prose with complete sentences that build understanding progressively
- Precision: Choose each word deliberately - every word must carry intention
- Economy: Eliminate fluff, unnecessary adjectives, fancy grammar
- Clarity: Use precise technical terms, avoid ambiguity. Embed exact numbers in sentences, not bullets
- Directness: State findings clearly without embellishment
- Signal-to-noise: High information density, respect reader's time
- Bullet discipline: Use bullets only for distinct lists (products, companies, steps). Default to prose paragraphs
- Examples of precision:
- Bad: "significantly improved outcomes" → Good: "reduced mortality 23% (p<0.01)"
- Bad: "several studies suggest" → Good: "5 RCTs (n=1,847) show"
- Bad: "potentially beneficial" → Good: "increased biomarker X by 15%"
- Bad: "• Market: $2.4B" → Good: "The market reached $2.4 billion in 2023, driven by consumer demand [1]."

Quality gates (enforced by validator):
- Minimum 2,000 words (standard mode)
- Average credibility score >60/100
- 3+ sources per major claim
- Clear facts vs. analysis distinction
- All sections present and detailed


Error Handling & Stop Rules

Stop immediately if:
- 2 validation failures on same error → Pause, report, ask user
- <5 sources after exhaustive search → Report limitation, request direction
- User interrupts/changes scope → Confirm new direction

Graceful degradation:
- 5-10 sources → Note in limitations, proceed with extra verification
- Time constraint reached → Package partial results, document gaps
- High-priority critique issue → Address immediately

Error format:

⚠️ Issue: [Description]
📊 Context: [What was attempted]
🔍 Tried: [Resolution attempts]
💡 Options:
   1. [Option 1]
   2. [Option 2]
   3. [Option 3]

Quality Standards (Always Enforce)

Every report must:
- 10+ sources (document if fewer)
- 3+ sources per major claim
- Executive summary <250 words
- Full citations with URLs
- Credibility assessment
- Limitations section
- Methodology documented
- No placeholders

Priority: Thoroughness over speed. Quality > speed.


Inputs & Assumptions

Required:
- Research question (string)

Optional:
- Mode (quick/standard/deep/ultradeep)
- Time constraints
- Required perspectives/sources
- Output format

Assumptions:
- User requires verified, citation-backed information
- 10-50 sources available on topic
- Time investment: 5-45 minutes


When to Use / NOT Use

Use when:
- Comprehensive analysis (10+ sources needed)
- Comparing technologies/approaches/strategies
- State-of-the-art reviews
- Multi-perspective investigation
- Technical decisions
- Market/trend analysis

Do NOT use:
- Simple lookups (use WebSearch)
- Debugging (use standard tools)
- 1-2 search answers
- Time-sensitive quick answers


Scripts (Offline, Python stdlib only)

Location: ./scripts/

  • research_engine.py - Orchestration engine
  • validate_report.py - Quality validation (8 checks)
  • citation_manager.py - Citation tracking
  • source_evaluator.py - Credibility scoring (0-100)

No external dependencies required.


Progressive References (Load On-Demand)

Do not inline these - reference only:
- Complete Methodology - 8-phase details
- Report Template - Output structure
- README - Usage docs
- Quick Start - Fast reference
- Competitive Analysis - vs OpenAI/Gemini

Context Management: Load files on-demand for current phase only. Do not preload all content.



Dynamic Execution Zone

User Query Processing:
[User research question will be inserted here during execution]

Retrieved Information:
[Search results and sources will be accumulated here]

Generated Analysis:
[Findings, synthesis, and report content generated here]

Note: This section remains empty in the skill definition. Content populated during runtime only.

# README.md

Deep Research Skill for Claude Code

A comprehensive research engine that brings Claude Desktop's Advanced Research capabilities (and more) to Claude Code terminal.

Features

Core Research Pipeline

  • 8.5-Phase Research Pipeline: Scope → Plan → Retrieve (Parallel) → Triangulate → Outline Refinement → Synthesize → Critique → Refine → Package
  • Multiple Research Modes: Quick, Standard, Deep, and UltraDeep
  • Graph-of-Thoughts Reasoning: Non-linear exploration with branching thought paths

2025 Enhancements (Latest - v2.2)

  • 🔄 Auto-Continuation System (NEW): TRUE UNLIMITED length (50K, 100K+ words) via recursive agent spawning with context preservation
  • 📄 Progressive File Assembly: Section-by-section generation with quality safeguards
  • ⚡ Parallel Search Execution: 5-10 concurrent searches + parallel agents (3-5x faster Phase 3)
  • 🎯 First Finish Search (FFS) Pattern: Adaptive completion based on quality thresholds
  • 🔍 Enhanced Citation Validation (CiteGuard): Hallucination detection, URL verification, multi-source cross-checking
  • 📋 Dynamic Outline Evolution (WebWeaver): Adapt structure after Phase 4 based on evidence
  • 🔗 Attribution Gradients UI: Interactive citation tooltips showing evidence chains in HTML reports
  • 🛡️ Anti-Fatigue Enforcement: Prose-first quality checks prevent bullet-point degradation

Traditional Strengths

  • Citation Management: Automatic source tracking and bibliography generation
  • Source Credibility Assessment: Evaluates source quality and potential biases
  • Structured Reports: Professional markdown, HTML (McKinsey-style), and PDF outputs
  • Verification & Triangulation: Cross-references claims across multiple sources

Installation

The skill is already installed globally in ~/.claude/skills/deep-research/

No additional dependencies required for basic usage.

Usage

In Claude Code

Simply invoke the skill:

Use deep research to analyze the state of quantum computing in 2025

Or specify a mode:

Use deep research in ultradeep mode to compare PostgreSQL vs Supabase

Direct CLI Usage

# Standard research
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode standard

# Deep research (all 8 phases)
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode deep

# Quick research (3 phases only)
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode quick

# Ultra-deep research (extended iterations)
python ~/.claude/skills/deep-research/research_engine.py --query "Your research question" --mode ultradeep

Research Modes

Mode Phases Duration Best For
Quick 3 phases 2-5 min Simple topics, initial exploration
Standard 6 phases 5-10 min Most research questions
Deep 8 phases 10-20 min Complex topics requiring thorough analysis
UltraDeep 8+ phases 20-45 min Critical decisions, comprehensive reports

Output

Research reports are saved to organized folders in ~/Documents/[Topic]_Research_[Date]/

Each report includes:
- Executive Summary
- Detailed Analysis with Citations
- Synthesis & Insights
- Limitations & Caveats
- Recommendations
- Full Bibliography
- Methodology Appendix

Unlimited Report Generation (2025 Auto-Continuation System)

Reports use progressive file assembly with auto-continuation - achieving truly unlimited length through recursive agent spawning:

How It Works:

  1. Initial Generation (18K words)
  2. Generate sections 1-10 progressively
  3. Each section written to file immediately (stays under 32K limit per agent)
  4. Save continuation state with research context

  5. Auto-Continuation (if needed)

  6. Automatically spawns continuation agent via Task tool
  7. Continuation agent loads state: themes, narrative arc, citations, quality metrics
  8. Generates next batch of sections (another 18K words)
  9. Updates state and spawns next agent if more sections remain

  10. Recursive Chaining

  11. Each agent stays under 32K output token limit
  12. Chain continues until all sections complete
  13. Final agent generates bibliography and validates report

Realistic Report Sizes:
- Quick mode: 2,000-4,000 words (single run) ✅
- Standard mode: 4,000-8,000 words (single run) ✅
- Deep mode: 8,000-15,000 words (single run) ✅
- UltraDeep mode: 20,000-100,000+ words (auto-continuation) ✅

Example: 50,000 word report:
- Agent 1: Sections 1-10 (18K words) → Spawns Agent 2
- Agent 2: Sections 11-20 (18K words) → Spawns Agent 3
- Agent 3: Sections 21-25 + Bibliography (14K words) → Complete!
- Total: 50K words across 3 agents, each under 32K limit

Context Preservation (Quality Safeguards):

Continuation state includes:
- ✅ Research question and key themes
- ✅ Main findings summaries (100 words each)
- ✅ Narrative arc position (beginning/middle/end)
- ✅ Quality metrics (avg words, citation density, prose ratio)
- ✅ All citations used + bibliography entries
- ✅ Writing style characteristics

Each continuation agent:
- Reads last 3 sections to understand flow
- Maintains established themes and style
- Continues citation numbering correctly
- Matches quality metrics (±20% tolerance)
- Verifies coherence before each section

Quality Gates (Per Section):
- [ ] Word count: Within ±20% of average
- [ ] Citation density: Matches established rate
- [ ] Prose ratio: ≥80% prose (not bullets)
- [ ] Theme alignment: Ties to key themes
- [ ] Style consistency: Matches established patterns

Benefits:
- ✅ TRUE unlimited length (50K, 100K+ words achievable)
- ✅ Fully automatic (no manual intervention)
- ✅ Context preserved across continuations
- ✅ Quality maintained throughout
- ✅ Each agent stays under 32K token limit
- ✅ Progressive assembly prevents truncation

Examples

Technology Analysis

Use deep research to evaluate whether we should adopt Next.js 15 for our project

Market Research

Use deep research to analyze longevity biotech funding trends 2023-2025

Technical Decision

Use deep research to compare authentication solutions: Auth0 vs Clerk vs Supabase Auth

Scientific Review

Use deep research in ultradeep mode to summarize recent advances in senolytic therapies

Quality Standards

Every research output:
- ✅ Minimum 10+ distinct sources
- ✅ Citations for all major claims
- ✅ Cross-verified facts (3+ sources)
- ✅ Executive summary under 250 words
- ✅ Limitations section
- ✅ Full bibliography
- ✅ Methodology documentation

Architecture

deep-research/
├── SKILL.md                    # Main skill definition
├── research_engine.py          # Core orchestration engine
├── utils/
│   ├── citation_manager.py    # Citation tracking & bibliography
│   └── source_evaluator.py    # Source credibility assessment
├── requirements.txt
└── README.md

Tips for Best Results

  1. Be Specific: Frame questions clearly with context
  2. Set Expectations: Specify if you need comparisons, recommendations, or pure analysis
  3. Choose Appropriate Mode: Use Quick for exploration, Deep for decisions
  4. Review Scope: Check Phase 1 output to ensure research is on track
  5. Leverage Citations: Use citation numbers to drill deeper into specific sources

Comparison with Claude Desktop Research

Feature Claude Desktop Deep Research Skill
Multi-source synthesis
Citation tracking
Iterative refinement
Source verification ✅ Enhanced
Credibility scoring
8-phase methodology
Graph-of-Thoughts
Multiple modes
Local file integration
Code execution

2025 Research Papers Implemented

This skill now incorporates cutting-edge techniques from 2025 academic research:

  1. Parallel Execution (GAP, Flash-Searcher, TPS-Bench)
  2. DAG-based parallel tool use for independent subtasks
  3. 3-5x faster retrieval phase
  4. Concurrent search strategies

  5. First Finish Search (arXiv 2505.18149)

  6. Quality threshold gates by mode
  7. Continue background searches for depth
  8. Optimal latency-accuracy tradeoff

  9. Citation Validation (CiteGuard, arXiv 2510.17853)

  10. Hallucination pattern detection
  11. Multi-source verification (DOI + URL)
  12. Strict mode for critical reports

  13. Dynamic Outlines (WebWeaver, arXiv 2509.13312)

  14. Evidence-driven structure adaptation
  15. Phase 4.5 refinement step
  16. Prevents locked-in research paths

  17. Attribution Gradients (arXiv 2510.00361)

  18. Interactive evidence chains
  19. Hover tooltips in HTML reports
  20. Improved auditability

Version

2.0 (2025-11-05) - Major update with 2025 research enhancements
1.0 (2025-11-04) - Initial release

License

User skill - modify as needed for your workflow

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.