XPrime17

BrightData

0
0
# Install this skill:
npx skills add XPrime17/Poseidon --skill "BrightData"

Install specific skill from multi-skill repository

# Description

Progressive URL scraping. USE WHEN Bright Data, scrape URL, web scraping tiers. SkillSearch('brightdata') for docs.

# SKILL.md


name: BrightData
description: Progressive URL scraping. USE WHEN Bright Data, scrape URL, web scraping tiers. SkillSearch('brightdata') for docs.
context: fork


Customization

Before executing, check for user customizations at:
~/.claude/skills/PAI/USER/SKILLCUSTOMIZATIONS/BrightData/

If this directory exists, load and apply any PREFERENCES.md, configurations, or resources found there. These override default behavior. If the directory does not exist, proceed with skill defaults.

🚨 MANDATORY: Voice Notification (REQUIRED BEFORE ANY ACTION)

You MUST send this notification BEFORE doing anything else when this skill is invoked.

  1. Send voice notification:
    bash curl -s -X POST http://localhost:8888/notify \ -H "Content-Type: application/json" \ -d '{"message": "Running the WORKFLOWNAME workflow in the BrightData skill to ACTION"}' \ > /dev/null 2>&1 &

  2. Output text notification:
    Running the **WorkflowName** workflow in the **BrightData** skill to ACTION...

This is not optional. Execute this curl command immediately upon skill invocation.

Workflow Routing

When executing a workflow, output this notification directly:

Running the **WorkflowName** workflow in the **Brightdata** skill to ACTION...

CRITICAL: This workflow implements progressive escalation for URL content retrieval.

When user requests scraping/fetching URL content:
Examples: "scrape this URL", "fetch this page", "get content from [URL]", "pull content from this site", "retrieve [URL]", "can't access this site", "this site is blocking me", "use Bright Data to fetch"
β†’ READ: ~/.claude/skills/brightdata/Workflows/Four-tier-scrape.md
β†’ EXECUTE: Four-tier progressive scraping workflow (WebFetch β†’ Curl β†’ Browser Automation β†’ Bright Data MCP)


When to Activate This Skill

Direct Scraping Requests (Categories 1-4)

  • "scrape this URL", "scrape [URL]", "scrape this page"
  • "fetch this URL", "fetch [URL]", "fetch this page", "fetch content from"
  • "pull content from [URL]", "pull this page", "pull from this site"
  • "get content from [URL]", "retrieve [URL]", "retrieve this page"
  • "do scraping on [URL]", "run scraper on [URL]"
  • "basic scrape", "quick scrape", "simple fetch"
  • "comprehensive scrape", "deep scrape", "full content extraction"

Access & Bot Detection Issues (Categories 5-7)

  • "can't access this site", "site is blocking me", "getting blocked"
  • "bot detection", "CAPTCHA", "access denied", "403 error"
  • "need to bypass bot detection", "get around blocking"
  • "this URL won't load", "can't fetch this page"
  • "use Bright Data", "use the scraper", "use advanced scraping"

Result-Oriented Requests (Category 8)

  • "get me the content from [URL]"
  • "extract text from [URL]"
  • "download this page content"
  • "convert [URL] to markdown"
  • "need the HTML from this site"

Use Case Indicators

  • User needs web content for research or analysis
  • Standard methods (WebFetch) are failing
  • Site has bot detection or rate limiting
  • Need reliable content extraction
  • Converting web pages to structured format (markdown)

Core Capabilities

Progressive Escalation Strategy:
1. Tier 1: WebFetch - Fast, simple, built-in Claude Code tool
2. Tier 2: Customized Curl - Chrome-like browser headers to bypass basic bot detection
3. Tier 3: Browser Automation - Full browser automation using Playwright for JavaScript-heavy sites
4. Tier 4: Bright Data MCP - Professional scraping service that handles CAPTCHA and advanced bot detection

Key Features:
- Automatic fallback between tiers
- Preserves content in markdown format
- Handles bot detection and CAPTCHA
- Works with any URL
- Efficient resource usage (only escalates when needed)


Workflow Overview

four-tier-scrape.md - Complete URL content scraping with four-tier fallback strategy
- When to use: Any URL content retrieval request
- Process: Start with WebFetch β†’ If fails, use curl with Chrome headers β†’ If fails, use Browser Automation β†’ If fails, use Bright Data MCP
- Output: URL content in markdown format


Extended Context

Integration Points:
- WebFetch Tool - Built-in Claude Code tool for basic URL fetching
- Bash Tool - For executing curl commands with custom headers
- Browser Automation - Playwright-based browser automation for JavaScript rendering
- Bright Data MCP - mcp__Brightdata__scrape_as_markdown for advanced scraping

When Each Tier Is Used:
- Tier 1 (WebFetch): Simple sites, public content, no bot detection
- Tier 2 (Curl): Sites with basic user-agent checking, simple bot detection
- Tier 3 (Browser Automation): Sites requiring JavaScript execution, dynamic content loading
- Tier 4 (Bright Data): Sites with CAPTCHA, advanced bot detection, residential proxy requirements

Configuration:
No configuration required - all tools are available by default in Claude Code


Examples

Example 1: Simple Public Website

User: "Scrape https://example.com"

Skill Response:
1. Routes to three-tier-scrape.md
2. Attempts Tier 1 (WebFetch)
3. Success β†’ Returns content in markdown
4. Total time: <5 seconds

Example 2: Site with JavaScript Requirements

User: "Can't access this site https://dynamic-site.com"

Skill Response:
1. Routes to four-tier-scrape.md
2. Attempts Tier 1 (WebFetch) β†’ Fails (blocked)
3. Attempts Tier 2 (Curl with Chrome headers) β†’ Fails (JavaScript required)
4. Attempts Tier 3 (Browser Automation) β†’ Success
5. Returns content in markdown
6. Total time: ~15-20 seconds

Example 3: Site with Advanced Bot Detection

User: "Scrape https://protected-site.com"

Skill Response:
1. Routes to four-tier-scrape.md
2. Attempts Tier 1 (WebFetch) β†’ Fails (blocked)
3. Attempts Tier 2 (Curl) β†’ Fails (advanced detection)
4. Attempts Tier 3 (Browser Automation) β†’ Fails (CAPTCHA)
5. Attempts Tier 4 (Bright Data MCP) β†’ Success
6. Returns content in markdown
7. Total time: ~30-40 seconds

Example 4: Explicit Bright Data Request

User: "Use Bright Data to fetch https://difficult-site.com"

Skill Response:
1. Routes to four-tier-scrape.md
2. User explicitly requested Bright Data
3. Goes directly to Tier 4 (Bright Data MCP) β†’ Success
4. Returns content in markdown
5. Total time: ~5-10 seconds


Related Documentation:
- ~/.claude/skills/PAI/SkillSystem.md - Canonical structure guide
- ~/.claude/skills/PAI/CONSTITUTION.md - Overall PAI philosophy

Last Updated: 2025-11-23

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.