michaelboeding

video-producer-agent

5
0
# Install this skill:
npx skills add michaelboeding/skills --skill "video-producer-agent"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: video-producer-agent
description: >
Use this skill to create complete videos with voiceover and music.
Triggers: "create video", "product video", "explainer video", "promo video", "demo video",
"training video", "ad video", "commercial", "marketing video", "video with voiceover",
"video with music", "brand video", "testimonial video"
Orchestrates: script, voiceover, background music, video clips/images, and final assembly.


Video Producer

Create complete videos with voiceover, music, and visuals.

This is an orchestrator skill that combines:
- Script/storyboard generation (Claude)
- Voiceover synthesis (Gemini TTS)
- Background music (Lyria)
- Video clip generation (Veo 3.1) or image animation
- Final assembly (FFmpeg via media-utils)

Workflow

Step 1: Gather Requirements (REQUIRED)

⚠️ DO NOT skip this step. DO NOT run init_project.py until you have ALL answers.

Use interactive questioning β€” ask ONE question at a time, wait for the response, then ask the next. This creates a collaborative spec-driven process.

Question Flow

⚠️ Use the AskUserQuestion tool for each question below. Do not just print questions in your response β€” use the tool to create interactive prompts with the options shown.

Q1: Subject

"I'll create that video! First β€” what's it about?

(e.g., product launch, brand story, tutorial, explainer β€” or describe your own)"

Wait for response.

Q2: Duration

"How long should the video be?

  • 15 seconds (quick hook)
  • 30 seconds (standard ad)
  • 60 seconds (explainer)
  • 2+ minutes (detailed)
  • Or specify your own duration"

Wait for response.

Q3: Style

"What visual style?

  • Premium/luxury
  • Fun/playful
  • Corporate/professional
  • Dramatic/cinematic
  • Minimal/clean
  • Or describe your own style"

Wait for response.

Q4: Assets

"Do you have existing images or video clips to use?

  • No, generate everything
  • Yes, I have images (provide paths)
  • Yes, I have video clips (provide paths)"

Wait for response.

Q5: Audio Strategy

"How should we handle audio?

  • Custom β€” I generate voiceover + background music
  • Veo native β€” Use Veo's built-in dialogue/SFX/ambient
  • Silent β€” No audio, add later"

Wait for response.

Q6: Voice (if custom audio)

"What voice tone for the voiceover?

  • Professional
  • Friendly/warm
  • Energetic
  • Calm/soothing
  • Dramatic
  • Or describe your own tone"

Wait for response.

Q7: Music (if custom audio)

"What music vibe?

  • Modern electronic
  • Cinematic/epic
  • Upbeat pop
  • Ambient/chill
  • Corporate
  • Or describe your own style"

Wait for response.

Q8: Format

"What aspect ratio?

  • 16:9 (YouTube, web)
  • 9:16 (TikTok, Reels, Shorts)
  • 1:1 (Instagram feed)"

Wait for response.

Q9: Resolution

"What resolution?

  • 720p (faster generation)
  • 1080p (standard HD)"

Wait for response.

Q10: Model

"Which Veo model?

  • veo-3.1 β€” Latest, highest quality (default)
  • veo-3.1-fast β€” Faster generation, slightly lower quality
  • veo-3 β€” Previous generation
  • veo-3-fast β€” Previous gen, faster"

Wait for response.

Quick Reference

Question Determines
Subject Scene content and prompts
Duration Scene count (Veo clips must be 4, 6, or 8 seconds)
Style Visual prompts and music selection
Assets Generate vs use existing
Audio custom, veo_audio, or silent
Voice TTS voice selection
Music Lyria prompt
Format Aspect ratio for Veo
Resolution 720p or 1080p output quality
Model veo-3.1, veo-3.1-fast, veo-3, veo-3-fast

Step 2: Initialize Project

Once you have the user's answers, initialize the project with their preferences:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/init_project.py \
  --name "Product Launch Video" \
  --duration 30 \
  --aspect-ratio 16:9 \
  --audio-strategy custom \
  --scenes 5

Step 3: Configure project.json

Edit project.json with scene prompts, voiceover text, and music style based on user's answers.

Step 4: Assemble the Video

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-producer-agent/scripts/assemble.py \
  --project ~/my_video_project/

Project Structure

When you initialize a project, this folder structure is created:

my_project/
β”œβ”€β”€ project.json          # Configuration: scenes, voiceover, music, settings
β”œβ”€β”€ storyboard.md         # Planning document for the video
β”œβ”€β”€ scenes/               # Generated video clips from Veo
β”‚   β”œβ”€β”€ scene1_intro.mp4
β”‚   β”œβ”€β”€ scene2_features.mp4
β”‚   └── scene3_cta.mp4
β”œβ”€β”€ audio/                # Audio assets
β”‚   β”œβ”€β”€ voiceover.wav     # Generated voiceover
β”‚   β”œβ”€β”€ background_music.wav  # Generated music
β”‚   └── final_mix.mp3     # Mixed audio track
β”œβ”€β”€ work/                 # Intermediate files (auto-generated)
β”‚   β”œβ”€β”€ silent_scene1.mp4
β”‚   β”œβ”€β”€ video_concatenated.mp4
β”‚   └── ...
└── output/               # Final deliverables
    └── product_launch_video_final.mp4

Scripts

init_project.py

Initialize a new video project with folder structure and templates.

# Basic project
python3 init_project.py --name "My Video" --duration 30

# Create in specific directory
python3 init_project.py --name "Demo Video" --output ~/Videos/

# Vertical video for social
python3 init_project.py --name "Instagram Reel" --aspect-ratio 9:16 --duration 15

# Use Veo's native audio (no custom voiceover/music)
python3 init_project.py --name "Cinematic Scene" --audio-strategy veo_audio

# More scenes
python3 init_project.py --name "Long Video" --duration 60 --scenes 5

Options:
| Option | Default | Description |
|--------|---------|-------------|
| --name | required | Project name |
| --output | current dir | Parent directory |
| --duration | 30 | Target duration in seconds |
| --aspect-ratio | 16:9 | 16:9, 9:16, 1:1, 4:3 |
| --audio-strategy | custom | custom, veo_audio, silent |
| --scenes | 3 | Number of scene placeholders |

assemble.py

Orchestrate the full video assembly pipeline.

# Full pipeline (generate everything + assemble)
python3 assemble.py --project ~/my_project/

# Skip generation (use existing scene/audio files)
python3 assemble.py --project ~/my_project/ --skip-generation

# Dry run (show what would be done)
python3 assemble.py --project ~/my_project/ --dry-run

Pipeline steps:
1. Generate video scenes (Veo 3.1)
2. Strip audio from scenes (if custom audio)
3. Generate voiceover (Gemini TTS)
4. Generate background music (Lyria)
5. Mix voiceover + music
6. Concatenate video clips
7. Merge audio with video
8. Output final video


project.json Configuration

{
  "name": "Product Launch Video",
  "duration_target": 30,
  "aspect_ratio": "16:9",
  "resolution": "720p",
  "audio_strategy": "custom",

  "scenes": [
    {
      "id": 1,
      "name": "scene1_hero",
      "prompt": "Cinematic slow zoom on premium product, dramatic lighting, high-end commercial style",
      "duration": 6,
      "notes": "Music only, no voiceover"
    },
    {
      "id": 2,
      "name": "scene2_features",
      "prompt": "Product features demonstration, sleek animations, modern tech aesthetic",
      "duration": 8,
      "notes": "Voiceover starts here"
    },
    {
      "id": 3,
      "name": "scene3_cta",
      "prompt": "Product with logo on clean background, call to action moment",
      "duration": 6,
      "notes": "Music swells, voiceover ends"
    }
  ],

  "voiceover": {
    "enabled": true,
    "text": "Introducing the future of audio. Crystal clear sound. All-day comfort. Experience the difference.",
    "voice": "Charon",
    "style": "Professional, confident, premium brand voice"
  },

  "music": {
    "enabled": true,
    "prompt": "modern electronic, premium, sleek, product showcase, subtle bass",
    "duration": 35,
    "bpm": 100,
    "brightness": 0.6
  },

  "assembly": {
    "transition": "fade",
    "transition_duration": 0.5,
    "music_volume": 0.3,
    "fade_in": 1.0,
    "fade_out": 2.0
  }
}

Audio Strategies

Strategy Description Use When
custom Strip Veo audio, add custom voiceover + music Most videos
veo_audio Keep Veo's generated audio (dialogue, SFX) Cinematic scenes, dialogues
silent Strip audio, output silent video Adding audio later

Workflow: Creating a Video

Step 1: Initialize Project

python3 init_project.py --name "Wireless Earbuds Promo" --duration 30

Step 2: Plan the Storyboard

Edit storyboard.md to plan your video structure:

## Scene 1: Hero Reveal (0-5s)
- Visual: Earbuds emerging from shadow, premium lighting
- Audio: Music only (dramatic intro)

## Scene 2: Sound Quality (5-12s)
- Visual: Sound waves, person enjoying music
- Audio: Voiceover: "Crystal clear sound. Immersive bass."

## Scene 3: Comfort (12-20s)
- Visual: Close-up of fit, person running
- Audio: Voiceover: "All-day comfort. Secure fit."

## Scene 4: CTA (20-30s)
- Visual: Product + logo
- Audio: Voiceover: "Experience the difference." + music swell

Step 3: Configure project.json

Fill in the scene prompts, voiceover text, and music style based on your storyboard.

Step 4: Assemble

python3 assemble.py --project ~/wireless_earbuds_promo/

Step 5: Review and Iterate

Check the output in output/. If adjustments needed:

# Re-run with existing scenes (just re-mix audio)
python3 assemble.py --project ~/wireless_earbuds_promo/ --skip-generation

What You Can Create

Type Example
Product video 30s hero video showcasing a product
Explainer video How-to or feature explanation
Promo/ad video Marketing advertisement
Demo video Product demonstration
Training video Internal training content
Testimonial Customer quote style video
Brand video Company/brand story

Prerequisites

  • GOOGLE_API_KEY - For Veo (video), Gemini TTS (voice), Lyria (music)
  • FFmpeg installed: brew install ffmpeg

Video Styles & Music Pairings

Style Music Prompt Voice
Premium/Luxury "elegant, minimal, ambient, sophisticated" Charon (informative)
Tech/Modern "electronic, futuristic, clean, innovative" Kore (firm)
Fun/Playful "upbeat, cheerful, acoustic, positive" Puck (upbeat)
Corporate "professional, inspiring, orchestral lite" Orus (firm)
Lifestyle "chill, aspirational, indie, warm" Aoede (breezy)
Dramatic/Cinematic "epic, orchestral, emotional, building" Gacrux (mature)

Common Video Structures

Product Video (30s)

0-5s:   Hero shot (music only)
5-15s:  Features (voiceover + music)
15-25s: Lifestyle/use case (voiceover + music)
25-30s: Logo + CTA (music fade)

Explainer Video (60s)

0-5s:   Hook/problem statement
5-20s:  Solution introduction
20-45s: How it works (3 steps)
45-55s: Benefits summary
55-60s: CTA

Testimonial Video (45s)

0-5s:   Intro/name card
5-35s:  Testimonial quote (multiple scenes)
35-45s: Product shot + logo

Manual Workflow (Without Scripts)

If you prefer to run each step manually:

Generate scenes:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/video-generation/scripts/veo.py \
  --batch scenes.json

Generate voiceover:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your voiceover text..." \
  --voice Charon \
  --style "Professional, warm"

Generate music:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "modern electronic, premium" \
  --duration 35

Strip audio from clips:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_strip_audio.py \
  -i scene*.mp4

Concatenate videos:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_concat.py \
  -i silent_*.mp4 --transition fade -o video.mp4

Mix audio:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice voiceover.wav --music music.wav -o audio.mp3

Merge audio + video:

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/video_audio_merge.py \
  --video video.mp4 --audio audio.mp3 -o final.mp4

Input Files You Can Provide

File Type How It's Used
Product images Animate with Veo as first frame
Logo (PNG) Overlay on final scene
Existing voiceover Place in audio/voiceover.wav
Brand music Place in audio/background_music.wav
Video clips Place in scenes/ folder
Script/copy Use for voiceover text

Limitations

  • Veo video duration: Max 8 seconds per clip (concatenate for longer)
  • Veo 3.1 includes audio: All clips have generated audio (strip if using custom)
  • Processing time: Video generation takes 1-3 minutes per clip
  • Resolution: Currently 720p or 1080p (1080p for 8s only)

Error Handling

Error Solution
"GOOGLE_API_KEY not set" Set up API key per README
"FFmpeg not found" Install: brew install ffmpeg
"project.json not found" Run init_project.py first
Video generation timeout Retry, or use shorter duration
Audio/video sync issues Adjust scene durations

Example Prompts

Simple:

"Create a 30-second product video for my new coffee maker"

Detailed:

"Create a 45-second product video for our new wireless earbuds. Premium, luxury feel. I have product photos attached. Professional male voiceover. Modern electronic music. 16:9 for YouTube."

With project:

"Initialize a video project called 'App Demo' with 5 scenes, 60 seconds total, vertical format for TikTok"

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.