document-conversion

by @linxule in AI & LLM

# Install this skill:

npx skills add linxule/interpretive-orchestration --skill "document-conversion"

Install specific skill from multi-skill repository

# Description

This skill should be used when users need to convert PDFs (especially with tables or figures), mentions 'convert', 'PDF', 'document processing', has complex academic papers to import, or asks about MinerU vs Markdownify.

# SKILL.md

name: document-conversion
description: "This skill should be used when users need to convert PDFs (especially with tables or figures), mentions 'convert', 'PDF', 'document processing', has complex academic papers to import, or asks about MinerU vs Markdownify."

document-conversion

Robust PDF and document conversion with intelligent tool selection. Chooses the best available conversion method based on document complexity and MCP availability.

When to Use

Use this skill when:
- User needs to convert PDFs, especially with tables or figures
- User mentions "convert", "PDF", "document processing"
- User has complex academic papers to import
- User asks about MinerU vs Markdownify

MCP Comparison

Feature	MinerU (Optional)	Markdownify (Bundled)
API Key Required	Yes (MINERU_API_KEY)	No
PDF Accuracy	90%+ (VLM mode)	Good
Table Extraction	Excellent	Basic
Figure Handling	Extracts + describes	Basic
Formula Recognition	Yes	Limited
Multi-column	Excellent	Good
Audio Transcription	No	Yes
Cost	Pay per page	Free

When to Use Which

Use MinerU When:

PDF has complex tables with merged cells
Document has multi-column layouts
Figures/charts need extraction
Mathematical formulas present
Academic paper with structured formatting
Accuracy is critical

Use Markdownify When:

Simple text-based documents
Audio files need transcription
No API key available
Cost is a concern
Document is straightforward

Tool Selection Logic

Is the document a PDF with tables/figures?
├── Yes, complex tables
│   └── MinerU available?
│       ├── Yes → Use MinerU (vlm mode)
│       └── No → Markdownify + manual review
├── Yes, simple formatting
│   └── Markdownify (good enough)
└── No, other format
    └── Is it audio?
        ├── Yes → Markdownify
        └── No → Markdownify (supports many formats)

Usage Examples

MinerU (Complex PDF)

Use mineru_parse to convert this academic paper:
- URL: https://example.com/paper.pdf
- Model: vlm (for 90% accuracy)
- Enable: formula, table recognition

Markdownify (Simple Document)

Use markdownify pdf-to-markdown for this interview guide

Batch Processing

For multiple PDFs:
1. Check which have complex tables (use MinerU)
2. Process simple ones with Markdownify
3. Queue complex ones for MinerU batch

MinerU Specific Features

VLM vs Pipeline Mode

VLM Mode: Uses vision-language model, 90%+ accuracy, slower
Pipeline Mode: Traditional parsing, faster, lower accuracy

Page Selection

Parse only specific pages:
mineru_parse({
  url: "https://...",
  pages: "1-10,15,20-25"
})

Batch Processing

Process multiple documents:
mineru_batch({
  urls: ["url1", "url2", "url3"],
  model: "vlm"
})

Output Quality Checklist

After conversion, verify:
- [ ] Text is accurately extracted
- [ ] Tables maintain structure
- [ ] Headers/sections are correct
- [ ] Figures have descriptions (if MinerU)
- [ ] Formulas are readable (if MinerU)
- [ ] No garbled text from OCR errors

Integration with Research Workflow

For Literature (Stream A)

Identify papers to convert
Complex papers → MinerU
Simple papers → Markdownify
Store in stream-a-theoretical/papers/

For Data Documents (Stream B)

Interview transcripts → Markdownify (audio)
PDF field notes → Markdownify or MinerU
Store in appropriate stage folder

Fallback Options

If both tools fail or unavailable:

Adobe Acrobat - Export to Word
Google Docs - Open PDF for auto-OCR
Tesseract OCR - Command-line tool
Manual transcription - Last resort

MCPs: MinerU (optional), Markdownify (bundled)
Skills: interview-ingest for audio, literature-sweep for papers
Configuration: .mcp.json defines MCP availability

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

document-conversion

# Description

# SKILL.md

document-conversion

When to Use

MCP Comparison

When to Use Which

Use MinerU When:

Use Markdownify When:

Tool Selection Logic

Usage Examples

MinerU (Complex PDF)

Markdownify (Simple Document)

Batch Processing

MinerU Specific Features

VLM vs Pipeline Mode

Page Selection

Batch Processing

Output Quality Checklist

Integration with Research Workflow

For Literature (Stream A)

For Data Documents (Stream B)

Fallback Options

Related

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill