Build or update the BlueBubbles external channel plugin for Moltbot (extension package, REST...
npx skills add JackZhu426/web-scraper-crawlee-skill
Or install specific skill: npx add-skill https://github.com/JackZhu426/web-scraper-crawlee-skill
# Description
|
# SKILL.md
name: web-scraper-builder
description: |
Build production-ready web scrapers using Crawlee and Playwright. This skill guides you through analyzing websites, identifying selectors, and generating crawler code that follows best practices.
Use when:
- User wants to "scrape a website", "extract data from a page", or "build a crawler"
- User provides a URL and asks to extract product/deal/content information
- User needs help finding CSS selectors for web elements
- User wants to create an Apify actor for data extraction
- User asks about "web scraping", "DOM analysis", or "Playwright selectors"
Triggers: "scrape", "crawler", "extract from website", "web scraping", "find selectors", "Apify actor", "Playwright", "Crawlee"
Web Scraper Builder
Build production-ready Crawlee + Playwright crawlers through guided analysis and code generation.
Workflow
Phase 1: Gather Requirements
Ask the user:
- Target URL: "What's the URL of the page you want to scrape?"
- Data fields: "What data do you need to extract? (e.g., product title, price, description, image, availability)"
- Page type: "Is this a listing page (multiple items) or a detail page (single item)?"
- Analysis method: "Would you like me to analyze the page live using browser tools, or would you prefer to provide HTML snippets?"
Phase 2: Analyze Page Structure
Option A: Live Analysis (Recommended)
Use mcp__claude-in-chrome__* or mcp__chrome-devtools__* tools to:
1. Navigate to the target URL
2. Take a screenshot for visual reference
3. Analyze the DOM structure
4. Identify data-testid attributes, semantic elements, and class patterns
Option B: User-Provided HTML
If the user provides HTML snippets:
1. Analyze the provided markup
2. Identify stable selector patterns
3. Suggest multiple fallback strategies
Phase 3: Build Selectors
Follow these selector priority rules (most stable β least stable):
// Priority 1: data-testid (most stable)
page.locator('[data-testid="product-title"]')
// Priority 2: Semantic/ARIA
page.getByRole('button', { name: /add to cart/i })
page.getByLabel('Search')
// Priority 3: Partial class match (resilient to hash suffixes)
page.locator('[class*="product-card"]')
page.locator('div[class*="price"]')
// Priority 4: Attribute selectors
page.locator('img[src*="cdn"]')
page.locator('a[href*="/product/"]')
// Priority 5: Fallback chains using .or()
page.locator('[data-testid="price"]')
.or(page.locator('[class*="price"]'))
.or(page.locator('.price'))
Always avoid:
- Exact class names with hashes: .product-card-abc123
- Deep nth-child chains: div:nth-child(3) > div:nth-child(2)
- Overly generic selectors: span, div
See references/selector-patterns.md for comprehensive selector strategies.
Phase 4: Generate Crawler Code
Use the template at scripts/crawler-template.ts as a starting point. Customize:
- Input interface - Define expected inputs (URLs, limits, flags)
- Router handlers - Create handlers for listing vs detail pages
- Extraction logic - Implement field extraction with fallbacks
- Error handling - Wrap extractions in try-catch blocks
Phase 5: Test & Iterate
- Run the crawler locally:
pnpm start:devorapify run --purge - Check extracted data in
storage/datasets/default/ - Refine selectors based on edge cases
- Add missing fallbacks for failed extractions
Key Patterns
Always use page.locator(), never document.querySelector()
// β
Correct - Auto-waits, better errors
const title = await page.locator('h1[class*="title"]').textContent();
// β Wrong - No auto-waiting, fragile
const title = await page.evaluate(() =>
document.querySelector('h1.title')?.textContent
);
Batch extraction (for listing pages)
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[data-testid="product-card"]'))
.map(card => ({
name: card.querySelector('[data-testid="title"]')?.textContent?.trim(),
price: card.querySelector('[data-testid="price"]')?.textContent?.trim(),
url: card.querySelector('a')?.getAttribute('href'),
}))
.filter(Boolean);
});
Waiting strategies
// Wait for element
await page.locator('[data-testid="products"]').waitFor({ timeout: 30000 });
// Wait for network idle (after clicking "Load More")
await page.waitForLoadState('networkidle', { timeout: 15000 });
// Wait for count to increase
await page.waitForFunction(
(min) => document.querySelectorAll('.product').length > min,
currentCount,
{ timeout: 15000 }
);
Handling popups/modals
const cookieBtn = page.locator('button:has-text("Accept")').first();
if (await cookieBtn.isVisible({ timeout: 3000 }).catch(() => false)) {
await cookieBtn.click();
}
await page.keyboard.press('Escape'); // Fallback
Resources
scripts/crawler-template.ts- Complete Crawlee + Playwright boilerplatereferences/selector-patterns.md- Comprehensive selector strategy guidereferences/common-ecommerce-patterns.md- Patterns for product pages, deals, pagination
Output
Deliver:
1. Selectors document - List of identified selectors with confidence ratings
2. Crawler code - Complete TypeScript file using Crawlee + Playwright
3. Testing instructions - How to run and validate the crawler
# README.md
Web Scraper Builder Skill
A Claude Code skill for building production-ready web scrapers using Crawlee and Playwright.
What This Skill Does
Guides you through the complete web scraping workflow:
- Gather Requirements - Asks about target URL, data fields, and page type
- Analyze Page Structure - Uses browser automation or user-provided HTML
- Build Selectors - Creates resilient CSS selectors with fallback chains
- Generate Crawler Code - Produces complete TypeScript code using Crawlee + Playwright
- Test & Iterate - Guides local testing and refinement
Features
- Live DOM Analysis - Uses Claude in Chrome or Chrome DevTools MCP tools
- Resilient Selectors - Priority-based strategy (data-testid β ARIA β class* β attributes)
- Fallback Chains - Automatic
.or()fallback patterns for reliability - Production Templates - Complete Crawlee + Playwright boilerplate
- E-commerce Patterns - Pre-built patterns for products, deals, pagination
Installation
Claude Code discovers skills by scanning for SKILL.md files in specific directories.
Option 1: Clone directly to skills directory (recommended)
# Personal scope (all your projects)
cd ~/.claude/skills
git clone https://github.com/JackZhu426/web-scraper-crawlee-skill.git
# Or project scope (current project only)
cd your-project/.claude/skills
git clone https://github.com/JackZhu426/web-scraper-crawlee-skill.git
Option 2: Download and copy
# Download/clone anywhere
git clone https://github.com/JackZhu426/web-scraper-crawlee-skill.git
# Copy to Claude skills directory
cp -r web-scraper-builder-skill ~/.claude/skills/web-scraper-builder
Option 3: Extract from .skill file
Download web-scraper-builder.skill from releases, then:
cd ~/.claude/skills
unzip web-scraper-builder.skill
Verify Installation
The skill should appear when you run /web-scraper-builder in Claude Code.
Usage
Once installed, trigger the skill by asking Claude Code:
- "I want to scrape products from this website"
- "Help me build a crawler for [URL]"
- "Find selectors for extracting prices from this page"
- "Create an Apify actor for data extraction"
Or invoke directly: /web-scraper-builder
Included Resources
| File | Description |
|---|---|
SKILL.md |
Core skill instructions |
scripts/crawler-template.ts |
Complete Crawlee + Playwright boilerplate |
references/selector-patterns.md |
Comprehensive selector strategy guide |
references/common-ecommerce-patterns.md |
E-commerce extraction patterns |
Requirements
- Claude Code CLI
- Node.js 18+ (for running generated crawlers)
- pnpm (recommended) or npm
License
MIT License - see LICENSE
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.