mikeng-io

deep-research

0
0
# Install this skill:
npx skills add mikeng-io/agent-skills --skill "deep-research"

Install specific skill from multi-skill repository

# Description

Generic multi-domain research framework with domain-aware scheduling. Model-agnostic and domain-agnostic - perform comprehensive research on any topic.

# SKILL.md


name: deep-research
description: Generic multi-domain research framework with domain-aware scheduling. Model-agnostic and domain-agnostic - perform comprehensive research on any topic.
location: managed
context: fork
allowed-tools:
- ToolSearch
- mcp__brave-search__brave_web_search
- mcp__web-search-prime__webSearchPrime
- mcp__web-reader__webReader
- mcp__sequential-thinking__sequentialthinking
- mcp__context7__query-docs
- mcp__zread__search_doc
- mcp__zread__read_file
- mcp__playwright__browser_navigate
- mcp__playwright__browser_snapshot
- mcp__playwright__browser_click
- mcp__playwright__browser_fill_form
- mcp__playwright__browser_take_screenshot
- mcp__playwright__browser_evaluate
- mcp__browser-tools__takeScreenshot
- mcp__browser-tools__getConsoleLogs
- mcp__browser-tools__getNetworkLogs
- Read
- Task
- Skill
- Write
- Bash(mkdir *)


Deep Research: Multi-Domain Research Framework

Execute this skill to perform comprehensive research on any topic using domain-aware scheduling and parallel information gathering.

Execution Instructions

When invoked, you will:

  1. Discover available research tools using ToolSearch:
  2. Find all available web search MCP tools
  3. Discover browser automation tools (Playwright, browser-tools)
  4. Identify documentation query tools
  5. Build tool inventory for research execution

  6. Analyze the research intent from the conversation to extract:

  7. Primary research topic/question
  8. Domains explicitly mentioned
  9. Domains inferred from context
  10. Research depth and scope indicators
  11. Research methodology requirements (web search, crawling, interactive)

  12. Create a research plan with domain-aware effort allocation:

  13. Primary domains get 70% of research effort
  14. Secondary domains get 30% of research effort
  15. Generate 5-10 search queries per domain
  16. Identify sites that may require browser automation
  17. Plan crawling strategy for complex sources

  18. Execute parallel research using multiple domain-focused agents:

  19. Spawn domain researchers in parallel
  20. Use web search for general information
  21. Use browser automation for interactive sites, paywalls, or dynamic content
  22. Use agent-browser skill for complex web interactions
  23. Collect and validate findings from each source

  24. Synthesize findings across domains:

  25. Identify patterns and consensus
  26. Surface contradictions and debates
  27. Generate evidence-based recommendations

  28. Validate output format against schema

  29. Generate and save report to .outputs/research/


Step 0: Discover Available Research Tools

Before beginning research, discover all available MCP tools and skills that can be used for information gathering.

Tool Discovery Process

Use ToolSearch to find available research tools:

# Search for web search tools
ToolSearch: "web search"
  β†’ Returns: brave-search, web-search-prime, etc.

# Search for browser automation tools
ToolSearch: "playwright browser"
  β†’ Returns: playwright navigation, screenshot, interaction tools

# Search for content extraction tools
ToolSearch: "web reader content"
  β†’ Returns: web-reader, content extraction tools

# Search for documentation query tools
ToolSearch: "documentation query"
  β†’ Returns: context7, zread tools

Build Tool Inventory

Create an inventory of available tools for the research session:

tool_inventory:
  web_search:
    - mcp__brave-search__brave_web_search
    - mcp__web-search-prime__webSearchPrime

  web_reading:
    - mcp__web-reader__webReader

  browser_automation:
    - mcp__playwright__browser_navigate
    - mcp__playwright__browser_snapshot
    - mcp__playwright__browser_click
    - mcp__playwright__browser_fill_form
    - mcp__playwright__browser_evaluate
    - mcp__browser-tools__takeScreenshot
    - mcp__browser-tools__getConsoleLogs

  documentation:
    - mcp__context7__query-docs
    - mcp__zread__search_doc

  skills:
    - agent-browser (for complex web interactions)

Tool Selection Strategy

Choose tools based on research requirements:

Standard Web Research:
- Use web-search tools for general queries
- Use web-reader for content extraction
- Fallback gracefully if tools unavailable

Dynamic/Interactive Content:
- Use browser_automation (Playwright) for:
- JavaScript-heavy sites
- Sites requiring interaction (forms, buttons)
- Dynamic content that loads on scroll
- Paywalled content (with proper authorization)

Complex Interactions:
- Use agent-browser skill for:
- Multi-step workflows (login β†’ navigate β†’ extract)
- Complex form filling
- Sites requiring human-like interaction
- Screenshot capture with analysis

Tool Availability Handling

If preferred tools unavailable:
1. Log missing tools
2. Use available alternatives
3. Adjust research strategy accordingly
4. Note limitations in final report

Example fallback chain:

Preferred: Playwright β†’ mcp__browser-tools β†’ agent-browser skill
Fallback: web-reader β†’ basic web-search

Step 1: Intent Analysis & Domain Detection

Analyze the conversation to extract research intent and infer domains.

research_intent:
  primary_topic: ""           # Main research question/topic
  explicit_domains: []        # Domains directly mentioned
  inferred_domains: []        # Domains inferred from context
  research_depth: BRIEF | STANDARD | COMPREHENSIVE
  scope_indicators: []        # Keywords suggesting scope (e.g., "overview", "deep dive")

Domain Detection Strategy

Primary Domains (70% effort allocation):
Extract from explicit mentions in conversation:
- Topics directly stated by user
- Subject areas named in research questions
- Fields explicitly referenced

Secondary Domains (30% effort allocation):
Infer from contextual cues:
- Language patterns (technical vs. business vs. creative)
- Concerns expressed (performance, security, UX)
- Artifacts referenced (code, designs, documents)

Domain Inference Examples

User: "Research how to implement event sourcing with Kafka"

Explicit domains:
  - Software Architecture
  - Distributed Systems

Inferred domains:
  - Data Engineering (from "event sourcing")
  - Performance (implied by distributed systems context)
User: "I need to understand the business implications of AI regulation"

Explicit domains:
  - Business
  - Law/Regulation

Inferred domains:
  - Ethics (from "AI regulation")
  - Technology (AI context)

Domain Output Format

domain_analysis:
  primary_domains:
    - domain: "{domain name}"
      confidence: HIGH | MEDIUM | LOW
      rationale: "why detected"
      keywords: ["relevant", "terms"]

  secondary_domains:
    - domain: "{domain name}"
      confidence: HIGH | MEDIUM | LOW
      rationale: "why inferred"
      keywords: ["relevant", "terms"]

Step 2: Research Planning

Generate a structured research plan with domain-aware effort allocation.

Research Question Generation

For each domain, generate 5-10 targeted search queries:

research_plan:
  domains:
    - domain: "{domain name}"
      effort_allocation: 0.70  # or 0.30 for secondary
      search_queries:
        - "{broad overview query}"
        - "{specific technical query}"
        - "{implementation/practical query}"
        - "{comparison/alternatives query}"
        - "{challenges/limitations query}"
        - "{recent developments query}"
        - "{best practices query}"
        - "{case studies query}"

Query Generation Guidelines

Query Types:
1. Overview: "What is {topic}?", "{topic} overview"
2. Technical: "{topic} implementation", "{topic} how it works"
3. Practical: "{topic} best practices", "{topic} tutorial"
4. Comparative: "{topic} vs {alternative}", "{topic} alternatives"
5. Critical: "{topic} problems", "{topic} limitations"
6. Current: "{topic} 2025", "{topic} latest developments"

Effort Allocation

effort_distribution:
  primary_domains: 0.70
  secondary_domains: 0.30

# Per-domain calculation
queries_per_primary_domain: 8-10
queries_per_secondary_domain: 4-6

Step 3: Information Gathering

Execute parallel research using domain-focused agents.

Tool Availability Check

Use the tool inventory from Step 0 to determine research capabilities:

# Core tools (expected to be available)
core_tools:
  - web_search                    # Brave Search or WebSearchPrime
  - web_reader                    # For content extraction
  - sequential_thinking           # For logic validation

# Browser automation (for dynamic/interactive content)
browser_tools:
  - playwright_navigate           # Navigate to URLs
  - playwright_snapshot           # Capture page state
  - playwright_click              # Interact with elements
  - playwright_fill_form          # Fill out forms
  - playwright_evaluate           # Execute JavaScript
  - browser_screenshot            # Visual capture
  - browser_console_logs          # Debug info
  - agent_browser_skill           # Complex interactions

# Specialized tools (optional)
specialized_tools:
  - documentation_query           # context7 or zread
  - repository_search             # For code/GitHub research

# Fallback strategy
fallback: "Use available tools, gracefully degrade, note limitations"

Research Methodology Selection

Choose methodology based on source types:

Static Content (use web_search + web_reader):
- News articles
- Blog posts
- Academic papers (PDFs)
- Documentation sites
- Wikipedia-style content

Dynamic Content (use browser_automation):
- JavaScript-heavy sites (SPAs)
- Infinite scroll pages
- Content loaded on interaction
- Sites with lazy loading

Interactive Content (use browser_automation + agent_browser):
- Sites requiring login (with authorization)
- Multi-step workflows
- Form submissions
- Search interfaces
- Filtered/paginated results

Paywalled/Gated Content:
- Use browser automation if access authorized
- Note limitations if access unavailable
- Look for alternative sources

Domain Researcher Template

Spawn a researcher for each domain using the Task tool:

Capability: high

You are a {DOMAIN} RESEARCHER. Your role is to gather comprehensive information about {topic} from a {DOMAIN} perspective.

## Research Focus
{domain_specific_context}

## Search Queries to Execute
{list_of_queries}

## Your Task
1. Execute each search query using available search tools
2. Read promising sources using appropriate method:
   - Static content: web-reader
   - Dynamic content: browser automation (Playwright)
   - Interactive content: agent-browser skill
3. Extract key findings, evidence, and sources with URLs
4. Assess source credibility (HIGH/MEDIUM/LOW)
5. Identify consensus vs. debate in the field

## Tool Strategy

**For Standard Web Research:**
- Use WebSearch tools for finding sources
- Use web-reader for extracting static content
- Use documentation queries for technical topics

**For Dynamic/Interactive Sites:**
- Use Playwright tools for:
  - Navigating: `mcp__playwright__browser_navigate`
  - Capturing state: `mcp__playwright__browser_snapshot`
  - Screenshots: `mcp__playwright__browser_take_screenshot`
  - JavaScript execution: `mcp__playwright__browser_evaluate`

**For Complex Interactions:**
- Use agent-browser skill when:
  - Multi-step workflows required
  - Forms need to be filled
  - Login/authentication needed (if authorized)
  - Human-like interaction necessary

**URL Requirements:**
- ALWAYS capture source URLs
- Use actual page URLs (not search result URLs)
- Include direct links for cross-referencing
- Note if URL requires authentication

**Graceful Degradation:**
- If browser tools unavailable, try web-reader
- If web-reader fails, note source as "inaccessible"
- Document tool limitations in findings

## Output Format (JSON)
{
  "agent": "domain-researcher-{domain}",
  "domain": "{domain name}",
  "queries_executed": ["list of queries executed"],
  "findings": [
    {
      "topic": "specific finding",
      "consensus": "STRONG | MODERATE | WEAK | DEBATE",
      "evidence": ["supporting points"],
      "sources": [
        {
          "url": "source URL",
          "title": "source title",
          "credibility": "HIGH | MEDIUM | LOW",
          "type": "academic | industry | blog | documentation | other",
          "date": "publication date if available",
          "key_points": ["extracted insights"]
        }
      ]
    }
  ],
  "contradictions": [
    {
      "topic": "what's debated",
      "viewpoints": ["conflicting perspectives"]
    }
  ],
  "gaps": ["information not found or unclear"]
}

Parallel Execution

Spawn all domain researchers in parallel:

execution_strategy:
  mode: parallel
  max_concurrent: 10
  timeout: 300  # seconds per researcher

researchers:
  - "{primary_domain_1}"
  - "{primary_domain_2}"
  - "{secondary_domain_1}"
  - "{secondary_domain_2}"

Browser-Based Research Crawling

For sources requiring browser automation, use this workflow:

When to Use Browser Automation

Use browser tools when encountering:
- ❌ web-reader returns incomplete content
- ❌ "JavaScript required" messages
- ❌ Dynamic content not loading
- ❌ Search interfaces requiring interaction
- ❌ Paginated results needing navigation
- ❌ Content behind forms or filters

Playwright Research Workflow

Basic Navigation & Extraction:

1. Navigate to URL
   β†’ mcp__playwright__browser_navigate(url)

2. Capture page state
   β†’ mcp__playwright__browser_snapshot
   β†’ Returns: HTML, visible text, accessibility tree

3. Extract specific content
   β†’ mcp__playwright__browser_evaluate(script)
   β†’ Execute JavaScript to extract data

Interactive Research (Forms, Search, Filters):

1. Navigate to site
   β†’ browser_navigate(url)

2. Fill search/filter forms
   β†’ browser_fill_form(selector, value)

3. Click search/submit buttons
   β†’ browser_click(selector)

4. Wait for results to load
   β†’ browser_evaluate("check if loaded")

5. Extract results
   β†’ browser_snapshot or browser_evaluate

Multi-Page Crawling:

For paginated results:
1. Extract page 1
2. Click "Next" button
3. Extract page 2
4. Repeat until complete or limit reached
5. Aggregate all findings

Agent-Browser Skill Integration

For complex multi-step workflows, use the agent-browser skill:

Invoke: /agent-browser or Skill tool with "agent-browser"

Use cases:
- Login flows (if authorized)
- Multi-step forms
- Complex navigation patterns
- Sites requiring human-like timing
- Screenshot capture with analysis

Example agent-browser prompt:

Use agent-browser to:
1. Navigate to {research_site}
2. Search for "{query}"
3. Extract top 10 results with:
   - Title
   - URL
   - Summary text
   - Publication date (if available)
4. Return as structured JSON

URL Capture Requirements

CRITICAL: Always capture actual content URLs:

βœ… Correct:

{
  "url": "https://example.com/article/actual-content",
  "title": "Article Title",
  "method": "playwright-browser-automation"
}

❌ Wrong:

{
  "url": "https://google.com/search?q=...",
  "title": "Search results",
  "method": "web-search"
}

URL Validation:
- URL must point to actual content (not search results)
- URL must be clickable/accessible
- URL must be permanent (not session-specific)
- If URL requires auth, note: "access": "requires-authentication"

Browser Automation Best Practices

Performance:
- Use browser automation only when necessary
- Prefer web-reader for static content
- Limit concurrent browser sessions (max 3)
- Set reasonable timeouts (30s per page)

Ethics & Legal:
- Respect robots.txt
- Honor rate limiting
- Don't bypass paywalls without authorization
- Note access requirements in findings

Error Handling:
- If browser automation fails β†’ try web-reader
- If web-reader fails β†’ note as "inaccessible"
- Log failures in research quality assessment
- Don't block research on single source failure


Step 4: Cross-Domain Exploration

After domain researchers complete, spawn cross-domain analysts:

Cross-Domain Analyst Template

Capability: high

You are a CROSS-DOMAIN ANALYST. Your role is to explore intersections and connections between domains.

## Domains to Analyze
{list_of_domains}

## Domain Findings Summary
{summary_of_findings_from_each_domain}

## Your Task
1. Identify intersections between domains
2. Find where domains agree and disagree
3. Surface insights that emerge only from cross-domain perspective
4. Identify trade-offs and tensions

## Output Format (JSON)
{
  "agent": "cross-domain-analyst",
  "intersections": [
    {
      "domains": ["domain1", "domain2"],
      "connection": "how they relate",
      "agreements": ["where domains align"],
      "tensions": ["where domains conflict"],
      "emergent_insights": ["insights from intersection"]
    }
  ],
  "domain_mapping": {
    "domain1": ["related domains"],
    "domain2": ["related domains"]
  }
}

Step 5: Analysis & Synthesis

Analyze findings for quality and generate synthesis.

Validation Strategy (Three-Layer)

Layer 1: Source Credibility Assessment

credibility_criteria:
  HIGH:
    - Academic papers with peer review
    - Official documentation
    - Industry standards bodies
    - Recognized experts in field

  MEDIUM:
    - Industry blogs (established companies)
    - Technical tutorials (reputable sources)
    - Conference presentations
    - Books from known publishers

  LOW:
    - Personal blogs without credentials
    - Forum discussions
    - Social media posts
    - Unverified claims

Layer 2: Cross-Reference Validation

validation_method: triangulation

# Finding is validated if:
triangulation_criteria:
  - Mentioned by 3+ independent sources
  - Appears in HIGH credibility sources
  - Consistent across domains

Layer 3: Internal Consistency Check

Use sequential-thinking tool to validate:
- Logical consistency of findings
- Cause-effect relationships
- Assumption validity

Synthesis Generation

Generate synthesis by:

  1. Identify Patterns: What themes emerge across sources?
  2. Surface Debates: Where do sources disagree?
  3. Assess Evidence: Which findings are well-supported?
  4. Map Connections: How do domains relate?

Step 6: Recommendations & Reporting

Generate structured report with evidence-based recommendations.

Report Structure

# Deep Research Report: {Topic}

**Generated:** {timestamp}
**Research Duration:** {duration}
**Domains Analyzed:** {list of domains}
**Sources Consulted:** {count}
**Validation Status:** {VALIDATED | PARTIAL | PRELIMINARY}

## Executive Summary

{3-5 sentence overview of key findings and recommendations}

---

## Research Intent & Scope

**Primary Research Question:**
{main question/topic}

**Scope:**
- Primary Domains: {list}
- Secondary Domains: {list}
- Research Depth: {BRIEF | STANDARD | COMPREHENSIVE}

---

## Key Findings by Domain

### {Domain 1}

#### {Finding 1}

**Consensus:** {STRONG | MODERATE | WEAK | DEBATE}

**Evidence:**
- {point 1}
- {point 2}

**Sources:**
- [{Title}]({URL}) - {credibility} - {key insight}
- [{Title}]({URL}) - {credibility} - {key insight}

#### {Finding 2}

{repeat pattern}

---

## Cross-Domain Insights

### Intersection: {Domain 1} + {Domain 2}

**Connection:** {how domains relate}

**Agreements:**
- {where domains align}

**Tensions:**
- {where domains conflict}

**Emergent Insights:**
- {insights from intersection}

---

## Synthesis & Patterns

### Key Patterns
{patterns identified across domains}

### Contradictions & Debates
{areas of disagreement with viewpoints}

### Information Gaps
{what could not be found or needs more research}

---

## Recommendations

### Recommendation 1: {Actionable recommendation}

**Rationale:** {evidence-based reasoning}

**Confidence:** {HIGH | MEDIUM | LOW}

**Supporting Evidence:**
- {finding from domain/source}

### Recommendation 2: {Another recommendation}

{repeat pattern}

---

## Research Quality Assessment

**Validation Method:** Triangulation across sources
**Source Credibility Distribution:**
- HIGH: {count} sources
- MEDIUM: {count} sources
- LOW: {count} sources

**Cross-Domain Validation:**
- {percentage}% of findings validated across multiple domains

**Limitations:**
- {constraints or gaps in research}

---

## Sources Bibliography

### {Domain 1} Sources
1. [{Title}]({URL}) - {credibility} - {date}
2. [{Title}]({URL}) - {credibility} - {date}

### {Domain 2} Sources
{repeat pattern}

---

## Appendix: Research Methodology

**Search Strategy:**
- Total queries executed: {count}
- Domains researched: {list}
- Tools used: {list}

**Quality Controls:**
- Source credibility assessment: βœ“
- Cross-reference validation: βœ“
- Internal consistency check: βœ“

Step 7: Validate Output Format

Before finalizing the report, validate it against the required format specification to ensure consistency.

Validation Gate

Spawn an output validator sub-agent using the Task tool:

Capability: standard

You are an OUTPUT VALIDATOR for deep-research reports. Your role is to ensure format compliance.

## Files to Validate
- Markdown: {path_to_markdown_file}
- JSON: {path_to_json_file}

## Validation Instructions
Follow the validation procedure defined in: skills/deep-research/validators/output-validator.md

## Schema Location
JSON Schema: skills/deep-research/schemas/research-report-schema.json

## Tasks
1. Load and validate JSON against schema
2. Validate markdown structure and required sections
3. Verify source quality and credibility distribution
4. Cross-check consistency between JSON and markdown
5. Generate validation report

## Output Format
Return validation result as JSON with:
- validation_status: PASS or FAIL
- Specific errors and warnings
- Source quality assessment
- Suggestions for fixes

## Strictness
FAIL on any critical errors:
- Missing required fields
- Invalid enum values
- Type mismatches
- Missing required sections
- Poor source quality (>70% LOW credibility)
- Insufficient sources (<3 total)

Handling Validation Results

If validation PASSES:
- Proceed to Step 8 (Save Report)

If validation FAILS:
1. Display all errors and warnings to user
2. Provide specific suggestions for each violation
3. DO NOT save report as "latest"
4. Ask user if they want to:
- Fix the issues and regenerate
- Override validation (with explicit confirmation)
- Cancel research

Example failure output:

❌ Validation FAILED

JSON Errors:
- Missing required field: recommendations
- Invalid research_depth value: 'DEEP' (must be BRIEF, STANDARD, or COMPREHENSIVE)
- executive_summary only 35 characters (minimum: 50)

Markdown Errors:
- Missing required section: ## Research Quality Assessment
- Domain 'Machine Learning' in metadata but no findings section

Source Quality Issues:
- Only 2 sources consulted (minimum: 3)
- 80% of sources are LOW credibility (threshold: 70%)
- Only 1 HIGH credibility source (recommended: at least 30%)

Suggestions:
1. Add at least one recommendation with rationale and evidence
2. Change research_depth to BRIEF, STANDARD, or COMPREHENSIVE
3. Expand executive_summary to at least 50 characters
4. Add ## Research Quality Assessment section
5. Add findings section for Machine Learning or remove from domains
6. Conduct more research - consult at least 3 sources total
7. Include more HIGH credibility sources (academic papers, official docs)

Would you like to regenerate the report with corrections?

Step 8: Save Report

Save the validated report to:

  1. Create directory: .outputs/research/
  2. Save with timestamp: YYYYMMDD-HHMMSS-research-{topic-slug}.md
  3. Save JSON version: YYYYMMDD-HHMMSS-research-{topic-slug}.json
  4. Update symlink: latest-research.md β†’ most recent report (only if validation passed)

Note: Only update the "latest" symlink for reports that pass validation.

# Output directory structure
.outputs/research/
β”œβ”€β”€ 20250115-143022-research-event-sourcing-kafka.md
β”œβ”€β”€ 20250115-143022-research-event-sourcing-kafka.json
β”œβ”€β”€ 20250116-091545-research-ai-regulation-business.md
β”œβ”€β”€ 20250116-091545-research-ai-regulation-business.json
└── latest-research.md β†’ (symlink to most recent)

Step 9: Configuration (Optional)

The system uses these defaults unless overridden:

Default Configuration

# Research execution
research:
  parallel_execution: true
  max_concurrent_researchers: 10
  timeout_seconds: 300

# Effort allocation
effort_distribution:
  primary_domains: 0.70
  secondary_domains: 0.30

# Queries per domain
queries_per_domain:
  primary: 8-10
  secondary: 4-6

# Output
output_directory: ".outputs/research/"
output_format: "markdown"
include_json: true

Configuration Override Order

  1. Built-in defaults
  2. Environment variables
  3. Config files in .outputs/research/config.yaml
  4. Command-line arguments

Environment Variables

# Execution
export DEEP_RESEARCH_PARALLEL="true"
export DEEP_RESEARCH_MAX_CONCURRENT="10"
export DEEP_RESEARCH_TIMEOUT="300"

# Effort allocation
export DEEP_RESEARCH_PRIMARY_RATIO="0.70"
export DEEP_RESEARCH_SECONDARY_RATIO="0.30"

# Output
export DEEP_RESEARCH_OUTPUT_DIR=".outputs/research/"
export DEEP_RESEARCH_OUTPUT_FORMAT="markdown"

Notes

  • Model-agnostic: Uses capability levels ("highest", "high", "standard") not specific model names
  • Domain-agnostic: Works for any domain detected from conversation
  • Conversation-driven: Extracts research intent and domains from what was discussed
  • Tool-agnostic: Gracefully handles unavailable tools with fallbacks
  • Evidence-based: All findings tied to sources with credibility assessment
  • Cross-domain: Identifies insights that emerge from domain intersections

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.