greendaygh

prophage-miner

0
0
# Install this skill:
npx skills add greendaygh/my-claude-skills --skill "prophage-miner"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: prophage-miner
description: >
This skill should be used when the user asks to "prophage ๋…ผ๋ฌธ ๋ถ„์„",
"prophage literature mining", "prophage gene extraction",
"๋ฐ•ํ…Œ๋ฆฌ์˜คํŒŒ์ง€ ํ”„๋กœํŒŒ์ง€ ์—ฐ๊ตฌ", "prophage knowledge graph",
"ํ”„๋กœํŒŒ์ง€ ์œ ์ „์ž ์ถ”์ถœ", "prophage 10ํšŒ ๋ฐ˜๋ณต", "prophage ์—ฐ์† ์‹คํ–‰",
or needs automated prophage-related literature collection and analysis.
user_invocable: true


Prophage Miner

PubMed์—์„œ prophage ๊ด€๋ จ ๋…ผ๋ฌธ์„ ์ž๋™ ๊ฒ€์ƒ‰/์„ ์ •ํ•˜๊ณ , PMC full text์—์„œ ์œ ์ „์ž/๋‹จ๋ฐฑ์งˆ/์ˆ™์ฃผ ๊ฐ์—ผ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜์—ฌ knowledge graph๋ฅผ ๊ตฌ์ถ•ํ•œ๋‹ค. Pydantic v2 ๊ฒ€์ฆ + 3์ธ ์ „๋ฌธ๊ฐ€ ํŒจ๋„ ํ•ฉ์˜๋กœ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์„ ๋ณด์žฅํ•œ๋‹ค. ๋ฐ˜๋ณต ์‹คํ–‰ ์‹œ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์— ์ฆ๋ถ„ ์—…๋ฐ์ดํŠธ๋œ๋‹ค.

์ถœ๋ ฅ ๋””๋ ‰ํ† ๋ฆฌ: ~/dev/phage/
์Šคํ‚ฌ ์œ„์น˜: ~/.claude/skills/prophage-miner/

Prerequisites

์˜์กด์„ฑ ์„ค์น˜ ํ™•์ธ (์ตœ์ดˆ 1ํšŒ):

pip install -r ~/.claude/skills/prophage-miner/requirements.txt

Orchestration

๋‹จ์ผ ์‹คํ–‰

์‚ฌ์šฉ์ž๊ฐ€ "prophage ๋ถ„์„ ์‹คํ–‰" ๋“ฑ์„ ์š”์ฒญํ•˜๋ฉด Phase 1-6์„ ์ˆœ์ฐจ ์ˆ˜ํ–‰ํ•œ๋‹ค.

NํšŒ ๋ฐ˜๋ณต ์‹คํ–‰

์‚ฌ์šฉ์ž๊ฐ€ "10ํšŒ ๋ฐ˜๋ณต" ๋“ฑ์„ ์ง€์‹œํ•˜๋ฉด for loop์œผ๋กœ Phase 1-6์„ ๋ฐ˜๋ณตํ•œ๋‹ค:

for i in 1..N:
  1. Phase 1: search_papers.py (๊ธฐ์กด PMID ์ œ์™ธ, run ์ž๋™ ์ƒ์„ฑ + ๋…ผ๋ฌธ ๋“ฑ๋ก) + Pydantic ๊ฒ€์ฆ
  2. Phase 2: fetch_fulltext.py (๋ฏธ๋‹ค์šด๋กœ๋“œ๋งŒ)
  3. Phase 3: ์„œ๋ธŒ์—์ด์ „ํŠธ ์œ„์ž„ (๋ฏธ์ถ”์ถœ๋งŒ, 4 ๋ณ‘๋ ฌ) + Pydantic ๊ฒ€์ฆ
  4. Phase 4: Expert Panel (i <= 4์ด๋ฉด Full Panel, i >= 5์ด๋ฉด Quick Panel)
  5. Phase 5: build_graph.py (์Šน์ธ๋œ ์ถ”์ถœ๋งŒ ์žฌ๊ตฌ์ถ•) + Pydantic ๊ฒ€์ฆ
  6. Phase 6: generate_report.py
  7. run_tracker.complete_run(run_id)  # run_id๋Š” search_papers๊ฐ€ ๋ฐ˜ํ™˜
  8. Report: "Run {i}/{N} | ๋ˆ„์  {total}ํŽธ | ๊ทธ๋ž˜ํ”„ {nodes}๋…ธ๋“œ {edges}์—ฃ์ง€"

์ตœ์ข…: run_tracker.summary() ์ถœ๋ ฅ

์ถœ๋ ฅ ๋””๋ ‰ํ† ๋ฆฌ ์ดˆ๊ธฐํ™”

์ตœ์ดˆ ์‹คํ–‰ ์‹œ ๋‹ค์Œ ๋””๋ ‰ํ† ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค:

mkdir -p ~/dev/phage/{00_config,01_papers/full_texts,02_extractions/per_paper,03_graph/exports,04_analysis,05_reports}
cp ~/.claude/skills/prophage-miner/assets/prophage_schema.json ~/dev/phage/00_config/schema.json

์‚ฌ์ „ ์ •์˜๋œ ํ‚ค์›Œ๋“œ๋กœ PubMed์„ ๊ฒ€์ƒ‰ํ•˜๊ณ  ๊ธฐ์กด ์ˆ˜์ง‘ PMID๋ฅผ ์ œ์™ธํ•œ ํ›„ ~20ํŽธ์„ ๋žœ๋ค ์„ ์ •ํ•œ๋‹ค.

์‹คํ–‰:

cd ~/.claude/skills/prophage-miner
python -m scripts.search_papers \
  --output ~/dev/phage \
  --exclude-file ~/dev/phage/00_config/run_registry.json \
  --select-n 20
# search_papers๊ฐ€ ์ž๋™์œผ๋กœ run์„ ์ƒ์„ฑํ•˜๊ณ  ๋…ผ๋ฌธ์„ run_registry์— ๋“ฑ๋กํ•œ๋‹ค.
# ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ดํ„ฐ๊ฐ€ ์ด๋ฏธ run์„ ์ƒ์„ฑํ•œ ๊ฒฝ์šฐ: --run-id run_002

Pydantic ๊ฒ€์ฆ:

python -m scripts.validate_data --papers ~/dev/phage/01_papers/paper_list.json

๊ฒ€์ฆ ์‹คํŒจ ์‹œ ์—๋Ÿฌ ๋ชฉ๋ก์„ ์ถœ๋ ฅํ•˜๊ณ  ์ž๋™ ์ˆ˜์ •์„ ์‹œ๋„ํ•œ๋‹ค. ์‹ฌ๊ฐํ•œ ์œ„๋ฐ˜ ์‹œ Phase 1์„ ์žฌ์‹คํ–‰ํ•œ๋‹ค.


Phase 2: Full Text Download (์ž๋™, ์ฆ๋ถ„)

has_full_text: false์ธ ๋…ผ๋ฌธ๋งŒ PMC/Europe PMC์—์„œ full text๋ฅผ ๋‹ค์šด๋กœ๋“œํ•œ๋‹ค.

์‹คํ–‰:

python -m scripts.fetch_fulltext \
  --input ~/dev/phage/01_papers/paper_list.json \
  --output ~/dev/phage \
  --pending-only

๋‹ค์šด๋กœ๋“œ ์‹คํŒจ ์‹œ abstract๋งŒ์œผ๋กœ ์ง„ํ–‰ (Phase 3์—์„œ confidence ํŽ˜๋„ํ‹ฐ).


Phase 3: Prophage Extraction (์„œ๋ธŒ์—์ด์ „ํŠธ ์œ„์ž„)

ํ•ต์‹ฌ: ์ด Phase๋Š” ๋ฉ”์ธ ์—์ด์ „ํŠธ๊ฐ€ ์ง์ ‘ ์ถ”์ถœํ•˜์ง€ ์•Š๊ณ , ์„œ๋ธŒ์—์ด์ „ํŠธ์— ์œ„์ž„ํ•˜์—ฌ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๋ฅผ ๋ณดํ˜ธํ•œ๋‹ค.

์ ˆ์ฐจ

  1. ๋ฏธ์ถ”์ถœ ๋…ผ๋ฌธ ๋ชฉ๋ก์„ ์กฐํšŒํ•œ๋‹ค:
import sys; sys.path.insert(0, str(Path.home() / ".claude/skills/prophage-miner"))
from scripts.run_tracker import RunTracker
tracker = RunTracker(Path.home() / "dev/phage")
pending = tracker.get_pending_extractions()
  1. pending ๋…ผ๋ฌธ์„ 4ํŽธ์”ฉ ๋ฌถ์–ด ๋ณ‘๋ ฌ ์„œ๋ธŒ์—์ด์ „ํŠธ์— ์œ„์ž„ํ•œ๋‹ค (Task ๋„๊ตฌ, subagent_type="generalPurpose"):

๊ฐ ์„œ๋ธŒ์—์ด์ „ํŠธ์— ์ „๋‹ฌํ•  ํ”„๋กฌํ”„ํŠธ:

You are a prophage biology extraction specialist.

1. Read ~/.claude/skills/prophage-miner/references/extraction_prompts.md for extraction guidelines.
2. Read ~/.claude/skills/prophage-miner/references/prophage_biology.md for domain context.
3. Read ~/dev/phage/00_config/schema.json for the entity/relationship schema.
4. Read ~/dev/phage/01_papers/full_texts/{paper_id}.txt for the full text.
5. Extract ALL prophage-related entities and relationships following the schema.
6. Apply section-based confidence weights:
   - Results: 0.9, Methods: 0.85, Abstract: 0.85, Introduction: 0.7, Discussion: 0.6
   - Abstract-only papers: apply -0.2 penalty
7. Save the extraction result:
   python -m scripts.extract_prophage save \
     --paper-id {paper_id} \
     --output ~/dev/phage/02_extractions/per_paper/
   Provide the extraction JSON via stdin.
8. Return a brief summary ONLY: entity count, relationship count, key prophage names found.
  1. ์„œ๋ธŒ์—์ด์ „ํŠธ ์™„๋ฃŒ ํ›„, ๋ฉ”์ธ ์—์ด์ „ํŠธ๊ฐ€ ์ƒํƒœ๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค:
tracker.mark_extracted(paper_id)
# ๋˜๋Š” ์‹คํŒจ ์‹œ:
tracker.mark_extract_failed(paper_id, "reason")
  1. Pydantic ๊ฒ€์ฆ (๊ฐ ์ถ”์ถœ ๊ฒฐ๊ณผ):
python -m scripts.validate_data --extraction ~/dev/phage/02_extractions/per_paper/{paper_id}_extraction.json

Phase 4: Expert Panel Review (3์ธ ์ž์œ ํ† ๋ก  + ํ•ฉ์˜)

Read ~/.claude/skills/prophage-miner/references/panel_protocol.md for the full protocol.
Read ~/.claude/skills/prophage-miner/assets/panel_config.json for panel configuration.

Full Panel (๊ธฐ๋ณธ, ์ฒ˜์Œ 4ํšŒ)

Round 1 - ๋…๋ฆฝ ๊ฒ€ํ†  (3์ธ ๋ณ‘๋ ฌ ์„œ๋ธŒ์—์ด์ „ํŠธ):

๊ฐ ์ „๋ฌธ๊ฐ€์—๊ฒŒ ์ „๋‹ฌํ•  ์ž…๋ ฅ:
- ์ด๋ฒˆ run์˜ ์ถ”์ถœ ์š”์•ฝ (์—”ํ‹ฐํ‹ฐ/๊ด€๊ณ„ ํƒ€์ž…๋ณ„ ๊ฐœ์ˆ˜ + ๋Œ€ํ‘œ ์˜ˆ์‹œ)
- ์‹ ๋ขฐ๋„ ๋ถ„ํฌ (min, max, mean)
- unschemaed ๋ฐœ๊ฒฌ ๋ชฉ๋ก

๊ฐ ์ „๋ฌธ๊ฐ€(์„œ๋ธŒ์—์ด์ „ํŠธ) ํ”„๋กฌํ”„ํŠธ:

You are {expert_name}, {persona_description}.
Review the following extraction summary and evaluate:
- Entity accuracy (accept/flag with reason)
- Relationship plausibility (accept/flag with reason)
- Missing entities or relationships
- Schema improvement suggestions

Input: {extraction_summary}

Output your assessment as JSON with: assessments (per paper), schema_suggestions, overall_quality.

Round 2 - ์ž์œ  ํ† ๋ก  (์ˆœ์ฐจ ์„œ๋ธŒ์—์ด์ „ํŠธ):
- Round 1์˜ 3์ธ ์˜๊ฒฌ์„ ๋ชจ๋‘ ๊ณต๊ฐœ
- ๊ฐ ์ „๋ฌธ๊ฐ€๊ฐ€ ๋‹ค๋ฅธ ์˜๊ฒฌ์„ ์ฐธ๊ณ ํ•˜์—ฌ ์žฌํ‰๊ฐ€
- ์ตœ๋Œ€ 2ํšŒ ์™•๋ณต

Round 3 - ํ•ฉ์˜ ํˆฌํ‘œ:
- ๊ฐ ์ „๋ฌธ๊ฐ€ ์ตœ์ข… ํŒ์ •: accept / flag_recheck / flag_reextract / reject
- 2/3 ์ด์ƒ ๋™์˜ ์‹œ ํ•ฉ์˜๋กœ ํ™•์ •

Quick Panel (5ํšŒ ์ด์ƒ ์—ฐ์†)

์กฐ๊ฑด: ์—ฐ์† 5ํšŒ ์ด์ƒ + ํ‰๊ท  panel_confidence >= 0.8
- Round 1๋งŒ ์ˆ˜ํ–‰
- ์‹ฌ๊ฐํ•œ flag๊ฐ€ ์—†์œผ๋ฉด ์ž๋™ ์Šน์ธ
- ์‹ฌ๊ฐํ•œ flag ๋ฐœ๊ฒฌ ์‹œ ์ฆ‰์‹œ Full Panel๋กœ ๋ณต๊ท€

ํŒ์ • ์ฒ˜๋ฆฌ

  • accept: ๊ทธ๋ž˜ํ”„์— ํฌํ•จ
  • flag_reextract: extraction_status๋ฅผ "pending"์œผ๋กœ ๋ณต์› โ†’ Phase 3์—์„œ ์žฌ์ถ”์ถœ
  • flag_recheck: ๋ฉ”์ธ ์—์ด์ „ํŠธ๊ฐ€ extraction์„ ์ง์ ‘ ์ˆ˜์ • ํ›„ ์žฌ๊ฒ€์ฆ
  • reject: extraction ์‚ญ์ œ, extraction_status๋ฅผ "rejected", ๊ทธ๋ž˜ํ”„์—์„œ ์ œ์™ธ

Phase 5: Knowledge Graph Construction (์ž๋™, idempotent)

์Šน์ธ๋œ ์ถ”์ถœ ๊ฒฐ๊ณผ๋งŒ ํ†ตํ•ฉํ•˜์—ฌ ๊ทธ๋ž˜ํ”„๋ฅผ ์žฌ๊ตฌ์ถ•ํ•œ๋‹ค.

์‹คํ–‰:

python -m scripts.build_graph \
  --input ~/dev/phage/02_extractions/per_paper \
  --output ~/dev/phage/03_graph \
  --registry ~/dev/phage/00_config/run_registry.json

Pydantic ๊ฒ€์ฆ:

python -m scripts.validate_data --graph ~/dev/phage/03_graph/

์ฐธ์กฐ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ: ๋ชจ๋“  edge์˜ from_id/to_id๊ฐ€ ์กด์žฌํ•˜๋Š” node๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋Š”์ง€ ํ™•์ธ.


Phase 6: Analysis & Report (์ž๋™, idempotent)

๊ทธ๋ž˜ํ”„ ๋ฐ์ดํ„ฐ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ถ„์„ ์นดํƒˆ๋กœ๊ทธ์™€ ๋ฆฌํฌํŠธ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

์‹คํ–‰:

python -m scripts.generate_report \
  --input ~/dev/phage/03_graph \
  --output ~/dev/phage

์ถœ๋ ฅ ํŒŒ์ผ:
- 04_analysis/prophage_catalog.json: ๋ฐœ๊ฒฌ๋œ prophage ์นดํƒˆ๋กœ๊ทธ
- 04_analysis/host_range_matrix.json: ํ˜ธ์ŠคํŠธ-ํŒŒ์ง€ ๊ฐ์—ผ ๋ฒ”์œ„
- 04_analysis/gene_inventory.json: ์œ ์ „์ž/๋‹จ๋ฐฑ์งˆ ์ธ๋ฒคํ† ๋ฆฌ
- 05_reports/research_report.md: ์˜๋ฌธ ์—ฐ๊ตฌ ๋ฆฌํฌํŠธ


Run Completion

๋งค run ์ข…๋ฃŒ ์‹œ:

tracker.complete_run(run_id)
s = tracker.summary()

๋ˆ„์  ํ†ต๊ณ„ ์ถœ๋ ฅ:

Run {i}/{N} completed
Total: {total_papers} papers | Extracted: {extracted} | Failed: {failed}
Graph: {nodes} nodes, {edges} edges
Panel confidence: {confidence}

Schema Reference

์Šคํ‚ค๋งˆ ํŒŒ์ผ: ~/dev/phage/00_config/schema.json

์—”ํ‹ฐํ‹ฐ ํƒ€์ž… (8์ข…)

Prophage, Gene, Protein, Host, IntegrationSite, Receptor, InductionCondition, Paper

๊ด€๊ณ„ ํƒ€์ž… (10์ข…)

ENCODES, TRANSLATES_TO, INTEGRATES_INTO, INFECTS, BINDS, REPRESSES, INDUCES, HOMOLOGOUS_TO, LYSIS_COMPONENT, EXTRACTED_FROM

์ƒ์„ธ ์ •์˜๋Š” ~/.claude/skills/prophage-miner/assets/prophage_schema.json ์ฐธ์กฐ.


Stability Notes

  • ์ปจํ…์ŠคํŠธ ๋ณดํ˜ธ: ๋ฉ”์ธ ์—์ด์ „ํŠธ๋Š” full text๋ฅผ ์ง์ ‘ ๋กœ๋“œํ•˜์ง€ ์•Š์Œ. ์Šคํฌ๋ฆฝํŠธ ์‹คํ–‰ + ์„œ๋ธŒ์—์ด์ „ํŠธ ์œ„์ž„๋งŒ ์ˆ˜ํ–‰
  • ํŒŒ์ผ ๊ธฐ๋ฐ˜ ์ƒํƒœ: ๋ชจ๋“  ์ƒํƒœ๊ฐ€ run_registry.json์— ์˜์†์ ์œผ๋กœ ์ €์žฅ
  • ์‹คํŒจ ๊ฒฉ๋ฆฌ: ์„œ๋ธŒ์—์ด์ „ํŠธ ์‹คํŒจ ์‹œ ํ•ด๋‹น ๋…ผ๋ฌธ๋งŒ failed๋กœ ํ‘œ์‹œ, ๋‚˜๋จธ์ง€ ๊ณ„์† ์ง„ํ–‰
  • ์žฌ์‹œ๋„: ๋‹ค์Œ run์—์„œ failed ๋…ผ๋ฌธ์„ ์ž๋™ ์žฌ์‹œ๋„ (pending์œผ๋กœ ๋ณต์›)
  • Idempotent Phase 5/6: ๊ทธ๋ž˜ํ”„/๋ฆฌํฌํŠธ๋Š” ํ•ญ์ƒ ์ „์ฒด ์žฌ๊ตฌ์ถ•์ด๋ฏ€๋กœ ์–ธ์ œ๋“  ์•ˆ์ „

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.