Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add jmservera/agent-skills --skill "transcribe-pdf"
Install specific skill from multi-skill repository
# Description
Render a scanned PDF to per-page images, auto-crop UI/header bars, then transcribe page-by-page using LLM vision (no OCR) into a single Markdown file.
# SKILL.md
name: transcribe-pdf
description: Render a scanned PDF to per-page images, auto-crop UI/header bars, then transcribe page-by-page using LLM vision (no OCR) into a single Markdown file.
Transcribe a scanned PDF to Markdown (vision-only)
Make it autonomous (reduce prompts)
This skill writes images and a running transcription.md. If your Copilot CLI asks for approval on each file write, have the user run /allow-all (or /yolo) once at the start of the session so you can proceed without repeated interruptions.
Hard rules (must follow)
- NO OCR: do not run OCR tools or OCR libraries. Transcription must be done using LLM vision.
- Whole document, one page at a time: render/prepare images for the whole PDF once, but transcribe sequentially page-by-page (never batch multiple pages into one vision prompt).
- Don’t overdo it: transcribe what’s legible in the page image; use placeholders for uncertain/unreadable parts.
- No zoom / no slicing unless necessary: prefer the already-cropped full-page image.
- Do not stop to ask: after starting, continue through all pages, appending as you go.
Why cropping matters (key finding)
Many archival viewers add header/footer UI bars (e.g., “PARES …”). These reduce effective resolution and can confuse transcription.
Always crop those bars out before transcribing.
Workflow
0) Install dependencies (once)
pip install -r "<skill_root>\scripts\requirements.txt"
1) (Optional) Confirm there is no usable text layer
This is a quick sanity check to avoid wasting time trying to extract text that isn’t there.
python "<skill_root>\scripts\check_text_layer_sample.py" "<pdf_path>"
If the samples show ~0 characters or only metadata, proceed with image transcription.
2) Prepare images for the whole PDF (single command)
If the workspace already contains correctly-cropped page-*.png images, skip to Step 3.
Recommended: render + crop in one run.
python "<skill_root>\scripts\render_and_crop_pdf.py" "<pdf_path>" "<out_dir>" --zoom 2.0
Outputs:
- "<out_dir>\cropped\page-01.png", page-02.png, … (canonical inputs for transcription)
Crop defaults are tuned to remove common header/footer bars at --zoom 2.0:
- --top 280 --bottom 140 (adjust if needed)
3) Transcribe the whole document, one page at a time (LLM vision)
Create the transcript file if it doesn’t exist, then append each page immediately (never hold results in chat).
- Transcript path:
"<out_dir>\transcription.md" - For each page image in
"<out_dir>\cropped": - Open/view exactly one image (e.g.,
page-21.png). - Transcribe what you can see into Markdown.
- Append immediately to the transcript under a heading
## Page N. - Move to the next page and repeat until the final page.
Transcription conventions
- Preserve original spelling/orthography when possible (old Spanish is expected).
- Use placeholders instead of guessing:
[ilegible]for unreadable words/lines[¿…?]for uncertain readings- If a page is blank or non-text: write
[Página en blanco]or a short note.
Helper scripts included
render_pdf_to_images.py: render PDF pages to PNGs.render_and_crop_pdf.py: render + crop in one command (preferred).crop_images.py: crop an existing set ofpage-*.pngimages.check_text_layer_sample.py: sample embedded PDF text layer (non-OCR).
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.