threads-archiver

by @oberon-mini in Web & API

# Install this skill:

npx skills add oberon-mini/oberon-skills --skill "threads-archiver"

Install specific skill from multi-skill repository

# Description

Archive and maintain Meta Threads post history into structured Markdown (index + per-post records) with stable URLs, dates, and content. Use when asked to scrape/export a Threads profile, backfill missing posts, deduplicate archives, convert raw post data to note format, or update a knowledge base (for example Obsidian) with chronological Threads records.

# SKILL.md

name: threads-archiver
description: Archive and maintain Meta Threads post history into structured Markdown (index + per-post records) with stable URLs, dates, and content. Use when asked to scrape/export a Threads profile, backfill missing posts, deduplicate archives, convert raw post data to note format, or update a knowledge base (for example Obsidian) with chronological Threads records.

Threads Archiver

Archive Threads data in a repeatable, reviewable format.

Core Workflow

Identify archive scope.
Collect post data.
Normalize records.
Merge and deduplicate.
Render/update markdown output.
Validate ordering and coverage.

1) Identify archive scope

Capture these inputs before processing:

Target account handle (for example @macau.drive.exam)
Output note path
Mode: full, incremental, or repair
Required fields: date, url, content (minimum)

If scope is unclear, ask one short clarification question, then proceed.

2) Collect post data

Prefer one of these sources:

Existing exported list (CSV/JSON/JSONL)
Previously archived markdown note
Browser-captured rows copied by the user

When using browser capture, gather at least:

Canonical post URL
Post date/time (or best available date)
Full visible text content (or best-effort text)

3) Normalize records

Normalize each record into this logical schema:

date: ISO date (YYYY-MM-DD) when possible
url: canonical Threads URL
content: post text (trimmed, preserve line breaks where meaningful)

Use scripts/build_threads_archive.py to normalize CSV/JSON/JSONL inputs and emit markdown-ready entries.

4) Merge and deduplicate

Deduplicate by URL first.

If URL is missing, use fallback key:

date + first 80 chars of content

When duplicates conflict:

Keep the record with richer content (longer non-whitespace text)
Keep canonical URL variant

5) Render/update markdown output

Use the format in references/archive-format.md.

Default output sections:

Metadata header (account + generated time)
Chronological index (newest first)
Detailed per-post entries (newest first)

For incremental updates:

Insert only new posts
Preserve existing manually edited commentary blocks if present

6) Validate ordering and coverage

Validate before finalizing:

Dates sorted newest → oldest
Every detailed post has a matching index row
No duplicate URLs
No empty content unless explicitly unavailable

Report summary:

Total posts
New posts added
Duplicates removed
Earliest and latest post date in archive

Resources

scripts/

build_threads_archive.py: Convert raw post exports (CSV/JSON/JSONL) into normalized records and markdown sections.

references/

archive-format.md: Canonical markdown template and formatting rules for index + detailed entries.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.