website-branding-extractor

by @ahutanu in Web & API

# Install this skill:

npx skills add ahutanu/awesome-agent-skills --skill "website-branding-extractor"

Install specific skill from multi-skill repository

# Description

Use when a user wants to extract branding assets from a website, mirror its design system locally, generate reusable static templates from a URL, inventory logos/styles/fonts/images, or validate whether a locally bundled template matches the source site's UI and branding.

# SKILL.md

name: website-branding-extractor
description: Use when a user wants to extract branding assets from a website, mirror its design system locally, generate reusable static templates from a URL, inventory logos/styles/fonts/images, or validate whether a locally bundled template matches the source site's UI and branding.

Website Branding Extractor

Extract the target URL's brand surface end-to-end: discover, inventory, download, template, and validate.

Required Intake

Collect the exact page or site URL from the user before doing anything else.
Confirm scope explicitly:
- single page only, or full domain;
- include CDN-hosted assets, or same-domain only;
- strict visual parity target, or "close match" fallback.

Default to full domain with strict visual parity when the user does not specify.

Workflow

1. Inventory all assets from sitemap + page crawl

Run:

python scripts/site_asset_inventory.py "https://example.com" --use-playwright

Use --use-playwright for JS-heavy sites or when static crawl misses hero/content assets.
Use --ignore-ssl-errors only when the target site has a broken or expired certificate and you still need a one-off extraction.

Inventory script responsibilities:
- parse robots.txt for sitemap entries;
- crawl sitemap indexes and URL sets deeply, including off-host sitemap URLs declared in robots.txt;
- fetch discovered pages and extract asset references from HTML/CSS;
- recursively discover additional same-site HTML pages from internal links when sitemap coverage is incomplete;
- optionally capture runtime-loaded assets with Playwright network tracing;
- capture both desktop and mobile rendered HTML snapshots for responsive sites when Playwright is enabled;
- classify assets (image, icon, stylesheet, font, script, media, other);
- output machine-readable and human-readable inventory reports.

2. Pull all assets locally into `assets/`

Run:

python scripts/download_assets.py \
  --inventory assets/inventory/<domain>/asset_inventory.json

Download script responsibilities:
- download all inventoried assets into assets/raw/<domain>/;
- preserve deterministic paths for stable references;
- generate checksums and manifest files;
- report failed downloads explicitly.
- accept --ignore-ssl-errors when the source cannot be fetched with normal TLS validation.

3. Build reusable static template bundle

Run:

python scripts/build_brand_template.py \
  --source-url "https://example.com" \
  --inventory assets/inventory/<domain>/asset_inventory.json \
  --manifest assets/raw/<domain>/download_manifest.json

Template builder responsibilities:
- copy assets into a deployable static bundle;
- rewrite HTML/CSS references to local bundle paths;
- rewrite copied stylesheet url(...) and @import dependencies to local bundle paths;
- rewrite and neutralize non-essential runtime scripts that would otherwise reintroduce remote analytics/editor dependencies;
- generate an editable starter template preserving visual language;
- emit responsive desktop/mobile HTML variants plus a single index.html router when a site renders materially different DOMs by viewport;
- extract brand tokens (colors, fonts) into brand-tokens.css;
- create an asset showcase page for inspection;
- emit remote_reference_audit.json so remaining network dependencies are explicit.
- accept --ignore-ssl-errors when the reference page itself has an invalid TLS certificate.

Bundle output path:
- assets/templates/<domain>/bundle/

4. Validate UI/UX fidelity against source

Install dependencies when missing:

python -m venv .venv-branding
source .venv-branding/bin/activate
python -m pip install playwright pillow
python -m playwright install chromium

Run:

python scripts/validate_template_fidelity.py \
  --source-url "https://example.com" \
  --template-dir assets/templates/<domain>/bundle \
  --threshold 2.5

Validation responsibilities:
- render source and generated template at desktop and mobile viewports;
- block outbound template requests so missing local dependencies cannot hide behind live network fetches;
- capture screenshots and thresholded pixel-diff artifacts;
- fail when mismatch exceeds threshold;
- write detailed report and diff images.

5. Deliver local outputs

Always provide these paths in the final response:
- assets/inventory/<domain>/asset_inventory.json
- assets/raw/<domain>/download_manifest.json
- assets/templates/<domain>/bundle/
- assets/templates/<domain>/bundle/remote_reference_audit.json
- assets/templates/<domain>/bundle/index.desktop.html and index.mobile.html when responsive variants were generated
- assets/validation/<domain>/ (when fidelity validation runs)

Quality Bar

Enforce these gates unless user explicitly relaxes them:
- Inventory completeness: crawl sitemaps and page HTML, then run Playwright capture when needed.
- Asset locality: templates must reference local bundle assets only.
- Visual parity: run screenshot diff for desktop + mobile, with outbound template requests blocked and low-intensity rasterization noise thresholded out.
- Reusability: include editable template page and extracted brand tokens.
- Documentation: provide generated notes + report files and exact local paths.

One-command pipeline

Run the orchestrator to execute the full workflow:

python scripts/run_branding_pipeline.py --url "https://example.com" --use-playwright --validate

For sites with broken certificates, use:

python scripts/run_branding_pipeline.py --url "https://example.com" --use-playwright --validate --ignore-ssl-errors

Read references/workflow-quality-gates.md for strict acceptance criteria and fallback behavior.
Read references/proven-practices.md for the external standards behind sitemap discovery and visual validation choices.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

website-branding-extractor

# Description

# SKILL.md

Website Branding Extractor

Required Intake

Workflow

1. Inventory all assets from sitemap + page crawl

2. Pull all assets locally into assets/

3. Build reusable static template bundle

4. Validate UI/UX fidelity against source

5. Deliver local outputs

Quality Bar

One-command pipeline

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill

2. Pull all assets locally into `assets/`