sunbos

html2md

0
0
# Install this skill:
npx skills add sunbos/sunbo-skills --skill "html2md"

Install specific skill from multi-skill repository

# Description

|

# SKILL.md


name: html2md
description: |
Convert HTML files to Markdown format with intelligent preprocessing.
Use when: (1) Converting single HTML file to Markdown, (2) Batch converting
HTML files in a directory, (3) Processing saved web pages (SingleFile),
(4) Converting documentation sites to Markdown.


HTML to Markdown Converter

Production-grade HTML to Markdown converter with dual engine support.

Quick Start

# Define script path
SCRIPT="${PLUGIN_DIR}/skills/html2md/scripts/html2md.py"

# Convert all HTML files in current directory
python3 "$SCRIPT" .

# Convert specific directory with output folder
python3 "$SCRIPT" ./docs -o ./markdown

# Recursive conversion
python3 "$SCRIPT" ./website -r -o ./output

# Force reconversion (ignore timestamps)
python3 "$SCRIPT" ./docs -f

# Dry run (preview only)
python3 "$SCRIPT" ./docs --dry-run

Common Options

Option Description
-o, --output DIR Output directory (default: same as input)
-r, --recursive Process subdirectories
-f, --force Force conversion even if output is newer
--engine {auto,markdownify,html2text} Conversion engine
--preset {default,compact,strict} Conversion preset
--aggressive Aggressive HTML cleaning (removes more elements)
--pattern GLOB File pattern (default: *.html)
--dry-run Preview without converting
-v, --verbose Verbose output
-q, --quiet Quiet mode (errors only)
-c, --config FILE Load settings from YAML config

Presets

Preset Description
default Standard conversion with escape handling
compact Minimal escaping, single-line breaks
strict Maximum escaping for clean output

Dependencies

Required (install at least one conversion engine):

macOS (Homebrew Python 3.13)

pip3.13 install markdownify html2text --break-system-packages

# Optional
pip3.13 install charset-normalizer tqdm pyyaml --break-system-packages

macOS (Xcode Python 3.9)

xcrun python3 -m pip install markdownify html2text --user

# Optional
xcrun python3 -m pip install charset-normalizer tqdm pyyaml --user

Linux / Windows

pip install markdownify html2text

# Optional
pip install charset-normalizer tqdm pyyaml

Optional packages explanation:
| Package | Feature |
|---------|---------|
| charset-normalizer | Auto encoding detection (CJK support) |
| tqdm | Progress bar for batch conversion |
| pyyaml | YAML config file support |

Usage Examples

Single file conversion

python3 "$SCRIPT" ./page.html
# Creates ./page.md

Batch convert documentation site

python3 "$SCRIPT" ./docs -r -o ./docs-md --preset compact

Convert SingleFile saved pages

python3 "$SCRIPT" ~/Downloads --pattern "*.html" --aggressive

Use with config file

python3 "$SCRIPT" -c config.yaml ./input

Config File Example (config.yaml)

engine: auto
preset: default
clean_html: true
aggressive_clean: false
add_title: true
encoding: utf-8

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.