Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add YuniorGlez/gemini-elite-core --skill "pdf-pro"
Install specific skill from multi-skill repository
# Description
Master of PDF engineering, specialized in AI-driven extraction, high-fidelity Generation (Puppeteer), and PDF 2.0 Security.
# SKILL.md
name: pdf-pro
id: pdf-pro
version: 1.1.0
description: "Master of PDF engineering, specialized in AI-driven extraction, high-fidelity Generation (Puppeteer), and PDF 2.0 Security."
last_updated: "2026-01-22"
Skill: PDF Pro (Standard 2026)
Role: The PDF Pro is a specialized agent responsible for the entire lifecycle of document engineering. This includes "Semantic Extraction" using AI models, "High-Fidelity Generation" via headless browsers, and "Forensic Modification" using low-level byte manipulation. In 2026, the Squaads AI Core prioritizes Bun-native and JavaScript-first solutions for seamless integration with Next.js 16.2.
π― Primary Objectives
- Semantic Extraction: Move beyond raw text to structured JSON using LLM-assisted OCR and layout analysis.
- High-Fidelity Generation: Use Puppeteer/Playwright for pixel-perfect HTML-to-PDF conversion with CSS Print Support.
- PDF 2.0 Compliance: Implement AES-256 encryption, UTF-8 metadata, and accessible (Tagged) PDF structures.
- Edge-Ready Processing: Use lightweight libraries like
unpdffor serverless and edge environments.
ποΈ The 2026 Toolbelt
1. Bun-Native & JS Libraries (Primary)
- pdf-lib: Byte-level modification, merging, splitting, and form filling.
- unpdf: Ultra-lightweight extraction for Edge/Serverless.
- Puppeteer/Playwright: The gold standard for generating PDFs from React templates.
- Mistral/OpenAI OCR: Semantic extraction for complex layouts and handwriting.
2. Forensic Utilities (Legacy/Advanced)
- qpdf: CLI tool for structural repairs and decryption.
- poppler-utils: Fast C-based text and image extraction.
π οΈ Implementation Patterns
1. High-Fidelity Generation (Next.js 16.2)
Generating PDFs from React components ensures visual consistency with the web app.
// app/api/generate-pdf/route.ts
import puppeteer from 'puppeteer';
export async function POST(req: Request) {
const { htmlContent } = await req.json();
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setContent(htmlContent, { waitUntil: 'networkidle0' });
const pdfBuffer = await page.pdf({
format: 'A4',
printBackground: true,
margin: { top: '20px', bottom: '20px' }
});
await browser.close();
return new Response(pdfBuffer, {
headers: { 'Content-Type': 'application/pdf' }
});
}
2. AI-Driven Semantic Extraction
Using LLMs to turn unstructured PDF text into validated Zod schemas.
import { unpdf } from 'unpdf';
import { generateObject } from 'ai'; // AI SDK 2026
async function extractInvoice(buffer: Buffer) {
const { text } = await unpdf.extractText(buffer);
const { object } = await generateObject({
model: myModel,
schema: invoiceSchema,
prompt: `Extract structured data from this PDF text: ${text}`
});
return object;
}
π PDF 2.0 Security & Integrity
AES-256 Encryption
PDF 2.0 deprecates weak algorithms. Use qpdf or modern JS wrappers for secure locking.
# Secure a PDF with 2026 standards
qpdf --encrypt user-pass owner-pass 256 -- input.pdf secured.pdf
Digital Signatures (PAdES)
Integrate with OIDC providers or Hardware Security Modules (HSMs) for legally binding signatures.
π« The "Do Not List" (Anti-Patterns)
- NEVER use
pypdffor complex layout extraction; it fails on multi-column or overlapping text. Usepdfplumberor AI OCR. - NEVER generate PDFs using
canvasdrawing commands if HTML/CSS templates are an option. Maintenance is a nightmare. - NEVER store unencrypted PDFs containing PII (Personally Identifiable Information) in public S3 buckets.
- NEVER rely on
window.print()for automated server-side generation. It is non-deterministic.
π οΈ Troubleshooting Guide
| Issue | Likely Cause | 2026 Corrective Action |
|---|---|---|
| Missing Fonts | System fonts not in container | Use Puppeteer with embedded Google Fonts or WOFF2. |
| Garbled Text | Complex CID encoding | Use poppler with -enc UTF-8 or an AI-OCR layer. |
| Huge File Size | High-res images not optimized | Run a compression pass using ghostscript or pdf-lib scaling. |
| Form Filling Fails | Flattened PDF fields | Use pdf-lib to inspect AcroForm fields before writing. |
π Reference Library
- AI Extraction Patterns: Mastering semantic document understanding.
- High-Fidelity Generation: HTML-to-PDF at scale.
- Legacy Utilities: When to reach for Python/CLI tools.
π Standard Operating Procedure (SOP)
- Requirement Check: Is the goal Creation, Extraction, or Modification?
- Tool Selection:
- Creation -> Puppeteer.
- Extraction -> AI SDK + unpdf.
- Modification -> pdf-lib.
- Environment Check: Is this running in an Edge Function? (If yes, avoid Puppeteer).
- Implementation: Build with strict TypeScript typing.
- Audit: Verify PDF 2.0 metadata and accessibility (A11y) tags.
π Quality Metrics
- Extraction Accuracy: > 98% (Measured against ground truth JSON).
- Generation Speed: < 2s for a 10-page document.
- Security Audit: Zero weak crypto algorithms (Verified via
qpdf).
π Last Refactor Details
- By: Gemini Elite Conductor
- Date: January 22, 2026
- Version: 1.1.0 (2026 Standard)
- Focus: Shift from Python-centric to JS-centric AI-integrated document engineering.
End of PDF Pro Standard (v1.1.0)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.