firecrawl

by @Anshin-Health-Solutions in Web & API

# Install this skill:

npx skills add Anshin-Health-Solutions/superpai --skill "firecrawl"

Install specific skill from multi-skill repository

# Description

Web scraping and research automation using the Firecrawl API. Converts any URL into LLM-optimized markdown, supports crawling entire sites, full-text search, and site mapping. Integrates with the research skill for deep information gathering.

# SKILL.md

name: firecrawl
description: Web scraping and research automation using the Firecrawl API. Converts any URL into LLM-optimized markdown, supports crawling entire sites, full-text search, and site mapping. Integrates with the research skill for deep information gathering.
triggers:
- /firecrawl
- "scrape this url"
- "crawl this site"
- "firecrawl search"
- "get content from"
- "scrape and summarize"

Firecrawl

Purpose

Firecrawl is the authoritative tool for extracting web content in a form LLMs can reason over. Use it any time you need to read a live webpage, crawl a documentation site, search the web for specific content, or map a site's URL structure. Raw HTML is not acceptable input — always pass Firecrawl markdown to the model.

Installation

# Install the Firecrawl SDK (TypeScript preferred)
bun add @mendable/firecrawl-js

# Or via npm if required
npm install @mendable/firecrawl-js

# Python fallback (only when TypeScript is not an option)
pip install firecrawl-py

Set your API key in the environment before any call:

export FIRECRAWL_API_KEY="fc-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

Obtain your key at https://firecrawl.dev. The free tier supports 500 credits/month. Each scrape costs 1 credit. Crawl jobs cost 1 credit per page.

Core API Patterns

Pattern 1: Single-Page Scrape

Use this when you have a specific URL and need its content.

import FirecrawlApp from "@mendable/firecrawl-js";

const app = new FirecrawlApp({ apiKey: process.env.FIRECRAWL_API_KEY });

const result = await app.scrapeUrl("https://example.com/docs/intro", {
  formats: ["markdown"],           // Always request markdown, never html
  onlyMainContent: true,           // Strip nav, footer, sidebar boilerplate
  waitFor: 2000,                   // ms to wait for JS-rendered content
  timeout: 30000,                  // Hard timeout in ms
});

if (result.success) {
  console.log(result.markdown);    // LLM-ready content
  console.log(result.metadata);    // title, description, ogImage, etc.
}

Options reference:
- formats: ["markdown"] | ["html"] | ["rawHtml"] | ["screenshot"]
- onlyMainContent: boolean — removes headers, footers, nav elements
- includeTags: ["article", "main"] — include only these HTML tags
- excludeTags: ["nav", "footer", "aside"] — strip these HTML tags
- waitFor: milliseconds to wait for dynamic content
- headers: custom HTTP headers (for auth-gated pages)

Pattern 2: Site Crawl

Use this when you need content from multiple pages of a site (e.g., full documentation).

const crawlJob = await app.crawlUrl("https://docs.example.com", {
  limit: 50,                        // Max pages to crawl
  maxDepth: 3,                      // Link depth from start URL
  scrapeOptions: {
    formats: ["markdown"],
    onlyMainContent: true,
  },
  allowBackwardLinks: false,         // Stay within the subtree
  allowExternalLinks: false,         // Do not follow external links
});

// crawlJob is async — poll for completion
if (crawlJob.success) {
  const results = await app.checkCrawlStatus(crawlJob.id);
  for (const page of results.data) {
    console.log(page.url, page.markdown);
  }
}

Crawl job lifecycle: PENDING -> RUNNING -> COMPLETED | FAILED

Always check results.status === "completed" before consuming data.

Pattern 3: Search

Use this to find pages matching a query without knowing the URL in advance.

const searchResult = await app.search("Claude Code custom keybindings site:docs.anthropic.com", {
  limit: 10,                         // Number of results
  lang: "en",
  country: "us",
  scrapeOptions: {
    formats: ["markdown"],
    onlyMainContent: true,
  },
});

for (const item of searchResult.data) {
  console.log(item.url);
  console.log(item.markdown);        // Full page content, not just snippet
}

Search uses Firecrawl's own index. For real-time results include the current year in your query. Combine with the research skill to rank and synthesize results.

Pattern 4: Site Map

Use this to discover all URLs on a site before deciding what to scrape.

const mapResult = await app.mapUrl("https://docs.example.com", {
  search: "authentication",          // Optional: filter URLs by keyword
  limit: 200,
});

console.log(mapResult.links);        // Array of discovered URLs

Map is cheap (1 credit per call regardless of site size). Always map before crawling large sites so you can filter to relevant sections.

Rate Limiting

Firecrawl enforces per-minute rate limits based on your plan:

Plan	Requests/min	Concurrent
Free	10	2
Hobby	60	5
Pro	300	20

Implement backoff for 429 responses:

async function scrapeWithRetry(url: string, maxRetries = 3): Promise<string> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const result = await app.scrapeUrl(url, { formats: ["markdown"] });
    if (result.success) return result.markdown ?? "";
    if (result.error?.includes("429")) {
      const delay = Math.pow(2, attempt) * 1000;  // 1s, 2s, 4s
      await new Promise(r => setTimeout(r, delay));
      continue;
    }
    throw new Error(result.error);
  }
  throw new Error("Max retries exceeded");
}

Batch Operations

When scraping more than 5 URLs, use batch scrape instead of sequential calls:

const batchResult = await app.batchScrapeUrls(
  [
    "https://example.com/page-1",
    "https://example.com/page-2",
    "https://example.com/page-3",
  ],
  { formats: ["markdown"], onlyMainContent: true }
);

// Poll until done
const status = await app.checkBatchScrapeStatus(batchResult.id);
for (const page of status.data) {
  console.log(page.url, page.markdown);
}

Batch scrape runs pages in parallel on Firecrawl's infrastructure, reducing wall-clock time significantly versus sequential scraping.

LLM-Optimized Output Guidelines

Always request markdown, not HTML. Reasons:

Markdown is 60-80% smaller than equivalent HTML
Navigation, ads, and boilerplate are stripped
Code blocks are preserved with language hints
Tables are converted to markdown table syntax
Links are preserved in [text](url) format

Pass the raw result.markdown string directly into your prompt. Do not post-process or summarize before passing to the model — let the model reason over the full content.

For very long pages (>50k tokens), chunk by heading sections:

function chunkByHeadings(markdown: string): string[] {
  return markdown.split(/\n(?=#{1,3} )/);
}

Integration with the Research Skill

When invoked alongside /research, Firecrawl serves as the data-collection layer:

Research skill generates a list of target URLs and search queries
Firecrawl scrapes and maps those URLs
Research skill synthesizes the markdown into a structured report

Invocation pattern:

/research topic="Claude Code plugin architecture"
  -> internally calls /firecrawl search "Claude Code plugin SKILL.md format"
  -> scrapes top 5 results
  -> synthesizes findings

Output Format

When this skill completes, output:

FIRECRAWL RESULT
URL: <scraped url>
Pages: <count>
Total characters: <count>
Status: SUCCESS | PARTIAL | FAILED

--- CONTENT ---
<markdown content>
--- END CONTENT ---

For crawl jobs, list each page URL and character count before the combined content block.

Error Handling

Error Code	Meaning	Action
400	Invalid URL or parameters	Validate URL format, check opts
401	Invalid API key	Check FIRECRAWL_API_KEY env var
403	Site blocks scraping	Try with custom headers
404	Page not found	Verify URL, try site map first
429	Rate limit exceeded	Exponential backoff
500	Firecrawl server error	Retry after 5 seconds
timeout	JS render took too long	Increase waitFor or use rawHtml

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.