mjunaidca

working-with-documents

19
9
# Install this skill:
npx skills add mjunaidca/mjs-agent-skills --skill "working-with-documents"

Install specific skill from multi-skill repository

# Description

|

# SKILL.md


name: working-with-documents
description: |
Creates and edits Office documents: Word (.docx), PDF, and PowerPoint (.pptx).
Use when working with document creation, PDF manipulation, presentation generation,
tracked changes, or converting between formats.


Working with Documents

Quick Reference

Format Read Create Edit
DOCX pandoc, python-docx docx-js OOXML (unpack/edit/pack)
PDF pdfplumber, pypdf reportlab pypdf (merge/split)
PPTX markitdown html2pptx OOXML (unpack/edit/pack)

Word Documents (.docx)

Reading Content

# Convert to markdown (preserves structure)
pandoc document.docx -o output.md

# With tracked changes visible
pandoc --track-changes=all document.docx -o output.md

Creating New Documents

Use docx-js (JavaScript):

const { Document, Packer, Paragraph, TextRun } = require('docx');

const doc = new Document({
  sections: [{
    children: [
      new Paragraph({
        children: [
          new TextRun({ text: "Hello World", bold: true }),
        ],
      }),
    ],
  }],
});

Packer.toBuffer(doc).then(buffer => {
  fs.writeFileSync("output.docx", buffer);
});

Editing Existing Documents (Tracked Changes)

# 1. Unpack
python ooxml/scripts/unpack.py document.docx unpacked/

# 2. Edit XML files in unpacked/word/document.xml
# Key files:
#   - word/document.xml (main content)
#   - word/comments.xml (comments)
#   - word/media/ (images)

# 3. Pack
python ooxml/scripts/pack.py unpacked/ edited.docx

Tracked changes XML pattern:

<!-- Deletion -->
<w:del><w:r><w:delText>old text</w:delText></w:r></w:del>

<!-- Insertion -->
<w:ins><w:r><w:t>new text</w:t></w:r></w:ins>

PDF Documents

Reading PDFs

import pdfplumber

# Extract text
with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        print(page.extract_text())

# Extract tables
with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        tables = page.extract_tables()
        for table in tables:
            for row in table:
                print(row)

Creating PDFs

from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet

doc = SimpleDocTemplate("output.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = [
    Paragraph("Report Title", styles['Title']),
    Paragraph("Body text goes here.", styles['Normal']),
]
doc.build(story)

Merging/Splitting PDFs

from pypdf import PdfReader, PdfWriter

# Merge
writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)
writer.write(open("merged.pdf", "wb"))

# Split
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    writer.write(open(f"page_{i+1}.pdf", "wb"))

Command-Line Tools

# Extract text
pdftotext input.pdf output.txt
pdftotext -layout input.pdf output.txt  # Preserve layout

# Merge with qpdf
qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

# Split pages
qpdf input.pdf --pages . 1-5 -- pages1-5.pdf

PowerPoint Presentations (.pptx)

Reading Content

# Convert to markdown
python -m markitdown presentation.pptx

Creating New Presentations

Use html2pptx workflow:

  1. Create HTML slides (720pt ร— 405pt for 16:9)
  2. Convert with html2pptx.js library
  3. Validate with thumbnail grid
# Create thumbnails for validation
python scripts/thumbnail.py output.pptx --cols 4

Editing Existing Presentations

# 1. Unpack
python ooxml/scripts/unpack.py presentation.pptx unpacked/

# Key files:
#   - ppt/slides/slide1.xml, slide2.xml, etc.
#   - ppt/notesSlides/ (speaker notes)
#   - ppt/media/ (images)

# 2. Edit XML

# 3. Validate
python ooxml/scripts/validate.py unpacked/ --original presentation.pptx

# 4. Pack
python ooxml/scripts/pack.py unpacked/ edited.pptx

Rearranging Slides

# Duplicate, reorder, delete slides
python scripts/rearrange.py template.pptx output.pptx 0,3,3,5,7
# Creates: slide 0, slide 3 (twice), slide 5, slide 7

Converting Between Formats

# DOCX/PPTX to PDF
soffice --headless --convert-to pdf document.docx

# PDF to images
pdftoppm -jpeg -r 150 document.pdf page
# Creates: page-1.jpg, page-2.jpg, etc.

# DOCX to Markdown
pandoc document.docx -o output.md

OCR for Scanned Documents

import pytesseract
from pdf2image import convert_from_path

images = convert_from_path('scanned.pdf')
text = ""
for image in images:
    text += pytesseract.image_to_string(image)

Design Guidelines (Presentations)

Color Palettes

Pick 3-5 colors that work together:

Palette Colors
Classic Blue Navy #1C2833, Slate #2E4053, Silver #AAB7B8
Teal & Coral Teal #5EA8A7, Coral #FE4447, White #FFFFFF
Black & Gold Gold #BF9A4A, Black #000000, Cream #F4F6F6

Web-Safe Fonts Only

Arial, Helvetica, Times New Roman, Georgia, Verdana, Tahoma, Trebuchet MS, Courier New, Impact

Layout Rules

  • Two-column: Use for exactly 2 distinct items
  • Three-column: Use for exactly 3 items
  • Never vertically stack charts below text
  • Full-bleed images with text overlays work well

Dependencies

# Python
pip install pypdf pdfplumber reportlab python-docx openpyxl

# System tools
apt-get install pandoc poppler-utils libreoffice

# Node.js (for docx-js)
npm install docx

Verification

Run: python scripts/verify.py

  • working-with-spreadsheets - Excel file handling
  • building-nextjs-apps - Frontend for document uploads

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.