jyu-lt

format-transcript

0
0
# Install this skill:
npx skills add jyu-lt/agent-skills --skill "format-transcript"

Install specific skill from multi-skill repository

# Description

Convert raw interview and podcast transcripts into well-formatted markdown documents with proper structure, paragraphs, and headings. Use when working with transcript files, formatting interviews or conversations, or when the user mentions converting transcripts to markdown.

# SKILL.md


name: format-transcript
description: Convert raw interview and podcast transcripts into well-formatted markdown documents with proper structure, paragraphs, and headings. Use when working with transcript files, formatting interviews or conversations, or when the user mentions converting transcripts to markdown.


Format Transcript

This skill guides you through converting raw transcript files (typically line-by-line format) into polished, readable markdown documents.

Core Principles

  1. Read and Process Directly: Don't write scripts to parse the text. Read the transcript file yourself and produce the formatted version directly, applying judgment to every sentence.
  2. Chunked Processing: For large files, do not try to read the whole file at once. Process in logical chunks (e.g., 800-2000 lines) and append the results iteratively.
  3. Quality over Speed: Take time to understand the conversation flow. Create meaningful thematic sections, not just a wall of text.

Formatting Workflow

1. Assess and Plan

  • Check File Size: If the file is small (< 1000 lines), read it all. If large, plan to tackle it in chunks.
  • Identify Speakers: Quickly scan the beginning to identify the speakers and their roles.

2. The Chunked Processing Loop (For Large Files)

Follow this loop until the file is distinct:

  1. Read a Chunk: Use view_file to read the next segment (e.g., lines 1-1000, then 1001-2000).
  2. Format in Memory: Process the raw text into the desired Markdown format (see "Formatting Rules" below).
  3. Append to File: Use run_command with cat <<EOF >> target_file.md to append the formatted text.
    • Tip: Using cat >> is safer than write_to_file for large documents as it avoids overwriting previous progress; strictly avoiding write_to_file for appending is critical.
  4. Track Progress: specificially note the last line number processed so you can resume correctly.

3. Formatting Rules

A. Text Transformation

  • Line-by-Line to Paragraphs: Merge raw lines into coherent paragraphs (3-7 sentences).
  • Clean Up:
    • Remove: "Um", "uh", "you know", "like" (when used as filler), stutters, and false starts.
    • Keep: "Voice", distinct phrasing, and laughter brackets like (laughs) if they add context.
    • Fix: Transcription errors (e.g., "Mark Andre" -> "Marc Andreessen").

B. Structure & Headers

  • Thematic Headings: Use ## and ### to break the conversation into topics.
    • Bad: ## Part 1, ## Timestamp 10:00
    • Good: ## The Crisis in Higher Education, ### The Federal Funding Cartel
  • Speaker Labels: Use Bold formatting: **Speaker Name:**.
  • Emphasis: Use bold for key concepts or "punchlines".

C. Highlights (Optional)

  • If a specific section is very dense, use a bulleted list to summarize the key arguments before or after the full text.
  • Pull quotes can be used for standout lines: > "The ring of power is inherently corrupting."

Output Template

# [Descriptive Title]

## Introduction
[Brief intro or setting the stage]

---

## [Thematic Header]

**Speaker A:** content...

**Speaker B:** content...

### [Sub-topic Header]

**Speaker A:** content...

---

Quality Checklist

  • [ ] No Wall of Text: Are there frequent headers and paragraph breaks?
  • [ ] Speaker Clarity: Is it always clear who is talking?
  • [ ] Clean Read: Does it read like a polished article or book, not a raw transcript?
  • [ ] Completeness: Did you process the entire chunk without skipping the middle?

Common Pitfalls

  • Hallucination: Don't invent transitions. If the topic jumps, just use a new Header.
  • Over-Summarization: The goal is a transcript, not a summary. Keep the bulk of the content, just clean it up.
  • Losing the Thread: When processing chunks, ensure you don't cut off a sentence in the middle. Overlap your reads by 5-10 lines if necessary to catch context.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.