web-scraping

by @AstraBert in Web & API

# Install this skill:

npx skills add AstraBert/scpr --skill "web-scraping"

Install specific skill from multi-skill repository

# Description

Scrape web pages based on a provided URL using the scpr CLI app.

name: web-scraping
description: Scrape web pages based on a provided URL using the scpr CLI app.

When asked to scrape a web page, use the scpr command line interface.

Basic usage (scrape a single page):

scpr --url https://example.com --output ./scraped

This will scrape the page and save it as a markdown file in the ./scraped folder.

Recursive scraping

To scrape a page and all linked pages within the same domain:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 3

Parallel scraping

Speed up recursive scraping with multiple threads:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2 --parallel 5

Additional options

--log - Set logging level (info, debug, warn, error)
--max - Maximum depth of pages to follow (default: 1)
--parallel - Number of concurrent threads (default: 1)
--allowed - Allowed domains for recursive scraping (can be specified multiple times)

For more details, run:

scpr --help

Once you are done with scraping, you should scan the output folder to find the content the user asked you for, here is an example flow:

scpr --url https://example.com --output ./scraped --recursive --allowed example.com --max 2
cd ./scraped
grep -r "pattern of interest"

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.