Build or update the BlueBubbles external channel plugin for Moltbot (extension package, REST...
npx skills add Anshin-Health-Solutions/superpai --skill "brightdata"
Install specific skill from multi-skill repository
# Description
Progressive URL scraping with Bright Data. Multiple tiers from free to premium.
# SKILL.md
name: brightdata
description: "Progressive URL scraping with Bright Data. Multiple tiers from free to premium."
triggers:
- Bright Data
- scrape URL
- web scraping
- data collection
Bright Data Skill
Progressive web scraping using a three-tier escalation methodology. Always start at the cheapest tier and escalate only when blocked. This skill covers direct fetching, proxy rotation, and full browser rendering.
Three-Tier Progressive Scraping Methodology
Tier 1: Direct Fetch (Free)
Tools: curl, WebFetch tool
Use when: Target site has no anti-bot protection, public content, no JavaScript rendering required.
# Simple direct fetch
curl -s -o output.html "https://example.com/page"
# Or use WebFetch tool with extraction prompt
# WebFetch url="https://example.com/page" prompt="Extract the main article text and metadata"
Escalate to Tier 2 when: You receive 403, 429, CAPTCHA challenges, or empty/bot-detection pages.
Tier 2: Proxy Rotation (Standard)
Tools: Bright Data residential/datacenter proxies
Cost: ~$0.10-0.60 per GB depending on proxy zone type
# Datacenter proxy (cheapest, least residential)
curl -x "http://brd-customer-{CUSTOMER_ID}-zone-datacenter:[email protected]:22225" \
-s "https://target-site.com/page"
# Residential proxy (more expensive, higher success rate)
curl -x "http://brd-customer-{CUSTOMER_ID}-zone-residential:[email protected]:22225" \
-s "https://target-site.com/page"
# With country targeting
curl -x "http://brd-customer-{CUSTOMER_ID}-zone-residential-country-us:[email protected]:22225" \
-s "https://target-site.com/page"
Escalate to Tier 3 when: Proxy requests still return bot detection, site requires JavaScript rendering, or content loads dynamically.
Tier 3: Browser Rendering (Premium)
Tools: Bright Data Scraping Browser or Web Unlocker
Cost: ~$1.00-3.00 per 1K requests
# Web Unlocker API (handles CAPTCHAs, fingerprinting, rendering)
curl -x "http://brd-customer-{CUSTOMER_ID}-zone-unlocker:[email protected]:22225" \
-s "https://heavily-protected-site.com/page"
# SERP API (specialized for search engines)
curl "https://api.brightdata.com/serp/req?customer={CUSTOMER_ID}&zone=serp" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d '{"query": "search terms", "search_engine": "google", "country": "us"}'
Cost Comparison Table
| Tier | Method | Cost per 1K Requests | Success Rate | Speed |
|---|---|---|---|---|
| 1 | Direct curl/WebFetch | Free | 40-60% | Fastest |
| 2a | Datacenter proxy | ~$0.10 | 60-75% | Fast |
| 2b | Residential proxy | ~$0.50 | 80-90% | Medium |
| 3a | Web Unlocker | ~$2.00 | 95-99% | Slower |
| 3b | SERP API | ~$3.00 | 99%+ | Slowest |
Proxy Zone Configuration
Zones are configured in the Bright Data dashboard (brightdata.com/cp). Each zone has:
- Zone Name: Identifier used in proxy URL (e.g., datacenter, residential, unlocker)
- Proxy Type: Datacenter, ISP, Residential, or Mobile
- Country Targeting: Append -country-{code} to zone name
- Session Management: Add -session-{id} for sticky sessions (same IP across requests)
SERP API Usage
For search engine results specifically, use the SERP API instead of general scraping:
curl "https://api.brightdata.com/serp/req" \
-H "Authorization: Bearer ${BRIGHTDATA_TOKEN}" \
-H "Content-Type: application/json" \
-d '{
"query": "best coffee shops in austin",
"search_engine": "google",
"country": "us",
"num_results": 20,
"parse": true
}'
The parse: true flag returns structured JSON with title, URL, snippet, and position for each result.
Detailed Process
- Assess the Target -- Check the URL. Is it a public page? Does it require JS? Is it a search engine?
- Start at Tier 1 -- Try direct fetch with curl or WebFetch. Inspect the response for real content.
- Evaluate Response -- Check for: 403/429 status, CAPTCHA HTML, empty body, bot detection messages.
- Escalate if Needed -- Move to Tier 2 (proxy) or Tier 3 (browser/unlocker) based on failure type.
- Extract Content -- Parse the successful HTML response for the target data.
- Return Structured Output -- Format extracted data as JSON matching the parser skill schema.
When to Escalate Between Tiers
| Signal | Current Tier | Action |
|---|---|---|
| HTTP 200 with real content | Any | Success -- do not escalate |
| HTTP 403 or 429 | Tier 1 | Escalate to Tier 2 (datacenter proxy) |
| Bot detection page | Tier 2a | Escalate to Tier 2b (residential proxy) |
| CAPTCHA challenge | Tier 2b | Escalate to Tier 3 (Web Unlocker) |
| JavaScript-rendered content | Tier 1 or 2 | Escalate to Tier 3 (browser rendering) |
| Search engine results | Any | Use SERP API directly |
When to Use
- User provides a URL and asks to "scrape it", "get the content", "extract data from this site"
- WebFetch returns blocked/empty content and escalation is needed
- Bulk URL scraping where some sites have anti-bot protection
- Search engine result collection (use SERP API path directly)
- Price monitoring, competitive analysis, or market research data collection
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.