apify

by @Anshin-Health-Solutions in Data & Analytics

# Install this skill:

npx skills add Anshin-Health-Solutions/superpai --skill "apify"

Install specific skill from multi-skill repository

# Description

Social media scraping and business data extraction via Apify actors.

# SKILL.md

name: apify
description: "Social media scraping and business data extraction via Apify actors."
triggers:
- Twitter scraping
- Instagram scraping
- LinkedIn scraping
- TikTok scraping
- YouTube scraping
- Google Maps scraping
- Amazon scraping
- Apify

Apify Skill

Data extraction from social media and business platforms using Apify's actor marketplace. Actors are pre-built scrapers that run on Apify infrastructure, returning structured datasets.

Requirements

Apify API Token: Set as environment variable APIFY_TOKEN or pass directly in API calls
Account Tier Awareness: Free tier provides $5/month compute; monitor usage at console.apify.com

Detailed Process

Identify Target Platform -- Determine which platform the user wants to scrape (Twitter, LinkedIn, Google Maps, etc.).
Select Actor -- Choose the correct actor ID from the table below based on platform and data type.
Configure Run Input -- Build the JSON input payload with search terms, URLs, result limits, and filters.
Execute Actor Run -- POST to the Apify API to start the actor run.
Poll for Completion -- Check run status until it reaches SUCCEEDED or FAILED.
Download Dataset -- Fetch results from the dataset endpoint, handling pagination if needed.
Format Output -- Return structured JSON to the user.

Actor Reference Table

Platform	Actor ID	Data Type	Cost Estimate
Twitter/X	`apidojo/tweet-scraper`	Tweets, profiles, followers	~$0.50/1K tweets
LinkedIn	`anchor/linkedin-people-search`	People profiles, companies	~$2.00/1K profiles
Google Maps	`compass/crawler-google-places`	Business listings, reviews	~$1.00/1K places
Instagram	`apify/instagram-scraper`	Posts, profiles, hashtags	~$0.80/1K posts
YouTube	`bernardo/youtube-scraper`	Videos, channels, comments	~$0.30/1K videos
Amazon	`junglee/amazon-crawler`	Products, reviews, prices	~$1.50/1K products
TikTok	`clockworks/tiktok-scraper`	Videos, profiles, hashtags	~$0.60/1K videos

API Invocation Pattern

Start an Actor Run

curl -X POST "https://api.apify.com/v2/acts/{ACTOR_ID}/runs?token=${APIFY_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "searchTerms": ["query here"],
    "maxItems": 100,
    "proxy": { "useApifyProxy": true }
  }'

Check Run Status

curl "https://api.apify.com/v2/actor-runs/{RUN_ID}?token=${APIFY_TOKEN}"

Download Dataset Results

curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?token=${APIFY_TOKEN}&format=json&limit=1000&offset=0"

Rate Limiting and Best Practices

Concurrency: Free tier allows 1 concurrent run; paid tiers allow more. Do not start multiple runs simultaneously on free tier.
Max Items: Always set maxItems to avoid runaway costs. Start with 100, increase only if needed.
Proxy Usage: Always set "useApifyProxy": true to avoid IP bans on target platforms.
Pagination: Dataset results return max 1000 items per request. Use offset parameter to paginate through larger datasets.
Timeouts: Actor runs have a default 1-hour timeout. Set timeoutSecs for shorter runs.

Output Handling

Actor runs produce datasets. Each dataset item is a JSON object with platform-specific fields. Common patterns:

{
  "items": [
    {
      "id": "platform-specific-id",
      "text": "Content text",
      "author": "Username or profile",
      "date": "ISO 8601 timestamp",
      "metrics": { "likes": 42, "shares": 7, "comments": 3 },
      "url": "Direct link to content"
    }
  ],
  "total": 100,
  "offset": 0,
  "limit": 1000
}

Cost Awareness

Always check actor pricing before running (visible on actor page)
Set maxItems conservatively -- you can always run again for more
Monitor usage at https://console.apify.com/billing
Free tier resets monthly; paid compute units do not roll over

When to Use

User asks to "scrape Twitter", "get LinkedIn profiles", "find Google Maps businesses"
Large-scale data collection from social platforms (beyond what WebFetch can handle)
Structured data extraction with specific field requirements (metrics, dates, engagement)
Recurring data collection tasks that benefit from Apify's scheduling features
When direct API access to a platform is unavailable or rate-limited

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.