Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add grahama1970/agent-skills --skill "fixture-tricky"
Install specific skill from multi-skill repository
# Description
>
# SKILL.md
name: fixture-tricky
description: >
Generate adversarial PDF content that breaks extractors. Creates false-positive
tables, malformed tables, cursed text, and layout traps. Extensible registry
for adding new edge cases discovered from real-world PDFs.
allowed-tools: Bash, Read, Write
triggers:
- create tricky pdf
- false positive table
- malformed table
- broken table
- extractor edge case
- adversarial pdf
- cursed text
metadata:
short-description: Adversarial PDF content for extractor stress testing
Fixture Tricky Skill
Generate adversarial PDF content designed to expose extractor bugs. Continuously
extensible as new edge cases are discovered from real-world PDFs.
Why This Exists
Real-world PDFs contain patterns that reliably break extractors:
- Text that Camelot/Marker falsely detect as tables
- Tables corrupted by Word/PDF conversions
- Text with ligatures, special characters, mixed directions
- Layout patterns that confuse section detection
This skill creates reproducible test cases for these issues.
Quick Start
cd .pi/skills/fixture-tricky
# Generate false-positive table content
uv run generate.py false-tables --output false_tables.pdf
# Generate malformed/corrupted tables
uv run generate.py malformed-tables --output malformed.pdf
# Generate text extraction nightmares
uv run generate.py cursed-text --output cursed.pdf
# Generate layout traps
uv run generate.py layout-traps --output layout.pdf
# All-in-one stress test
uv run generate.py gauntlet --output gauntlet.pdf
# List all available tricks
uv run generate.py list-tricks
Trick Categories
False-Positive Tables (false-tables)
Text patterns that extractors incorrectly identify as tables:
| Trick | Description |
|---|---|
numbered-list |
"1. Item one\n2. Item two" with aligned numbers |
address-block |
Multi-line addresses with aligned fields |
code-block |
Indented code with column-like alignment |
signature-block |
Name/title/date aligned like table rows |
key-value-pairs |
"Key: Value" patterns in sequence |
multi-column |
Two-column text layout |
toc-entries |
Table of contents with dotted leaders |
Malformed Tables (malformed-tables)
Real tables with structural problems:
| Trick | Description |
|---|---|
missing-columns |
Rows with fewer cells than header (Word import bug) |
ragged-rows |
Inconsistent column counts across rows |
merged-chaos |
Excessive cell merging breaking structure |
split-table |
Table split across page break |
nested-tables |
Tables inside table cells |
borderless |
No visible borders (detection challenge) |
partial-borders |
Some borders missing |
misaligned-columns |
Columns that don't line up |
Cursed Text (cursed-text)
Text extraction nightmares:
| Trick | Description |
|---|---|
ligatures |
fi, fl, ff, ffi, ffl characters |
math-symbols |
Equations with special notation |
mixed-scripts |
Latin + Greek + Cyrillic |
rtl-mixed |
Right-to-left text mixed with LTR |
subscript-superscript |
Chemical formulas, footnote markers |
invisible-chars |
Zero-width spaces, soft hyphens |
encoding-hell |
Characters that look alike but aren't |
Layout Traps (layout-traps)
Structure/layout patterns that confuse extractors:
| Trick | Description |
|---|---|
deep-nesting |
10+ levels of section hierarchy |
footnote-sections |
Footnotes that look like new sections |
sidebar |
Marginal notes alongside main text |
pull-quote |
Large quoted text in middle of content |
watermark |
Text overlaid with watermark |
rotated-text |
90ยฐ rotated text blocks |
floating-elements |
Content out of reading order |
Adding New Tricks
Tricks are registered in tricks/registry.py. To add a new trick:
# In tricks/registry.py
from .my_new_trick import generate_my_trick
TRICKS["my-new-trick"] = {
"category": "false-tables", # or malformed-tables, cursed-text, layout-traps
"description": "Description of what this trick tests",
"generator": generate_my_trick,
}
Or add directly to generate.py in the appropriate category dict.
Integration with pdf-fixture
Use with pdf-fixture to create comprehensive test suites:
# Generate clean fixture
cd ../pdf-fixture && uv run generate.py simple --output clean.pdf
# Generate tricky fixture
cd ../fixture-tricky && uv run generate.py gauntlet --output tricky.pdf
# Compare extractor results on both
Real-World Discovery Workflow
When you find a PDF that breaks the extractor:
- Identify the problematic pattern
- Add a new trick that reproduces it minimally
- Run
skills-syncto broadcast - Use the trick in regression testing
# Example: Found a PDF where Camelot detects email signatures as tables
uv run generate.py add-trick \
--name "email-signature" \
--category "false-tables" \
--description "Email signature blocks with name/title/phone"
Dependencies
dependencies = [
"pymupdf>=1.23.0",
"reportlab>=4.0.0",
"typer>=0.9.0",
]
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.