ai-video-producer

by @zysilm-ai in AI & LLM

# Install this skill:

npx skills add zysilm-ai/video-producer-skill --skill "gemini-video-producer"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: ai-video-producer
description: >
AI video production workflow using Google Flow via MCP Playwright browser automation.
Creates any video type: promotional, educational, narrative, social media,
animations, game trailers, music videos, product demos, and more. Use when
users want to create videos with AI, need help with video storyboarding,
keyframe generation, or video prompt writing. Follows a philosophy-first
approach: establish visual style and production philosophy, then execute
scene by scene with user feedback at each stage. Requires MCP Playwright
server and a Google account with Flow access (Google AI Pro or Ultra subscription).
allowed-tools: Read, Write, Edit, Glob, Grep, AskUserQuestion, TodoWrite, Task, Bash

AI Video Producer (MCP Edition)

Create professional AI-generated videos through a structured, iterative workflow using Google Flow via MCP Playwright.

Architecture Overview

This skill uses a main agent + sub-agents architecture for efficient context management:

┌─────────────────────────────────────────────────────────────────┐
│                         MAIN AGENT (this skill)                 │
│  - Handles all user interaction and approvals                  │
│  - Creates philosophy.md, style.json, scene-breakdown.md       │
│  - Generates pipeline.json                                      │
│  - Orchestrates sub-agents via Task tool                       │
│  - Updates pipeline.json status after each sub-agent returns   │
│  - Runs FFmpeg concatenation                                    │
└───────────────────────────┬─────────────────────────────────────┘
                            │ Task tool spawns general-purpose agents
                            │ with embedded instructions from .claude/agents/
        ┌───────────────────┼───────────────────┐
        ▼                   ▼                   ▼
┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│ asset-generator│   │keyframe-generator│ │segment-generator│
│ (general-purpose)│ │ (general-purpose)│ │ (general-purpose)│
├───────────────┤   ├───────────────┤   ├───────────────┤
│ Fresh context │   │ Fresh context │   │ Fresh context │
│ MCP browser   │   │ MCP browser   │   │ MCP browser   │
│ Returns path  │   │ Returns path  │   │ Returns paths │
│ + status only │   │ + status only │   │ + status only │
└───────────────┘   └───────────────┘   └───────────────┘

Benefits:
- Each generation has isolated memory (no context pollution)
- Browser automation details don't clutter main conversation
- Parallel execution possible for independent tasks
- Easy retry of individual failed generations
- Main agent stays focused on orchestration

Implementation Note: Sub-agents are spawned using subagent_type="general-purpose" with the full instructions from .claude/agents/*.md files embedded in the prompt. This achieves the same isolation benefits while using Claude Code's built-in agent system.

Prerequisites & Setup

Required:
- Google account with Flow access (Google AI Pro or Ultra subscription)
- Internet connection

MCP Playwright: If not installed, run automatically:

claude mcp add playwright -- npx @playwright/mcp@latest

No Python scripts required! Claude directly controls the browser via MCP.

Auto-Setup Check

At workflow start, verify MCP Playwright is available:
1. Try calling mcp__playwright__browser_snapshot()
2. If tools unavailable, offer to install: claude mcp add playwright -- npx @playwright/mcp@latest
3. After install, user must restart Claude Code for MCP to load

Google Flow Overview

URL: https://labs.google/fx/flow

Flow is a project-based AI filmmaking tool with these generation modes:

Mode	German Label	Model	Purpose
Text to Image	"Bild erstellen"	Nano Banana Pro	Generate assets and keyframes
Video from Frames	"Video aus Frames"	Veo 3.1 - Quality	Generate video segments from start frame
Text to Video	"Video aus Text"	Veo 3.1 - Quality	Generate video from text only
Video from Elements	"Video aus Elementen"	Veo 3.1 - Quality	Generate video with reference elements

Key Interface Elements:
- Mode selector dropdown (combobox) to switch between generation types
- Text input field for prompts
- "add" buttons for uploading reference images/frames
- "Erstellen" (Create) button to start generation
- Generated content appears in the gallery (Videos/Images tabs)
- Settings button ("tune" / "Einstellungen") to configure model and output count

REQUIRED Video Settings (configure via Settings button):
- Model: Veo 3.1 - Quality (NOT Veo 3.1 - Fast)
- Outputs per prompt: 1 (NOT 2)
- Aspect ratio: Querformat 16:9 (Landscape)

MANDATORY WORKFLOW REQUIREMENTS

YOU MUST FOLLOW THESE RULES:

ALWAYS use TodoWrite at the start to create a task list for the entire workflow
NEVER skip phases - complete each phase in order before proceeding
ALWAYS create required files - philosophy.md, style.json, scene-breakdown.md, and pipeline.json are REQUIRED
ALWAYS break videos into multiple scenes - minimum 2 scenes for any video over 5 seconds
ALWAYS ask user for approval before proceeding to the next phase
NEVER generate without a complete pipeline.json - plan ALL prompts first, execute second
ALWAYS use sub-agents for generation - use Task tool to spawn asset-generator, keyframe-generator, segment-generator
ALWAYS update pipeline.json after each sub-agent returns with status
ALWAYS move downloads to correct locations - files download to .playwright-mcp/, sub-agents handle this

Sub-Agent Definitions

Three sub-agents are defined in .claude/agents/:

asset-generator

Purpose: Generate ONE asset image (character, background, object) via Flow
Input: asset_id, prompt, output_path, project_dir
Output: JSON with status, asset_id, output_path, message
Uses: Flow "Bild erstellen" mode (Nano Banana Pro)

keyframe-generator

Purpose: Generate ONE scene starting keyframe via Flow
Input: scene_id, prompt, output_path, project_dir, style_context
Output: JSON with status, scene_id, output_path, message
Uses: Flow "Bild erstellen" mode (Nano Banana Pro)

segment-generator

Purpose: Generate ONE video segment (8 seconds max) via Flow
Input: segment_id, scene_id, motion_prompt, start_frame_path, output_video_path, project_dir, extract_end_frame, end_frame_path
Output: JSON with status, segment_id, scene_id, output_video_path, end_frame_path, message
Uses: Flow "Video aus Frames" mode (Veo 3.1 Fast)

How to Invoke Sub-Agents

Use the Task tool with subagent_type="general-purpose" and embed the agent instructions from .claude/agents/ in the prompt:

Task(
  subagent_type="general-purpose",
  prompt='''
  You are an asset-generator sub-agent. Follow these instructions:

  [Read and embed full contents of .claude/agents/asset-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "asset_id": "hero_character",
    "prompt": "A heroic knight in silver armor, standing tall, dramatic lighting",
    "output_path": "output/project/assets/characters/hero.png",
    "project_dir": "D:/Project/gemini-video-producer-skill/output/project"
  }
  ''',
  description="Generate hero character asset"
)

IMPORTANT: Before spawning sub-agents, use the Read tool to load the agent instructions:
- Read(".claude/agents/asset-generator.md") for asset generation
- Read(".claude/agents/keyframe-generator.md") for keyframe generation
- Read(".claude/agents/segment-generator.md") for segment generation

Then embed those instructions in the Task prompt.

Parallel Execution: For independent tasks, spawn multiple sub-agents in a single message:

# Assets can be generated in parallel (all use general-purpose with embedded instructions)
Task(subagent_type="general-purpose", prompt="[asset-generator instructions] + asset 1 task", description="Generate asset 1")
Task(subagent_type="general-purpose", prompt="[asset-generator instructions] + asset 2 task", description="Generate asset 2")
Task(subagent_type="general-purpose", prompt="[asset-generator instructions] + asset 3 task", description="Generate asset 3")

Sequential Execution: Segments within a scene must be sequential (frame chaining):

# Segment A first (uses keyframe)
result_A = Task(subagent_type="general-purpose", prompt="[segment-generator instructions] + seg A task...")

# Segment B uses extracted frame from A
result_B = Task(subagent_type="general-purpose", prompt="[segment-generator instructions] + seg B task...")

Pipeline Architecture

Scene and Segment Model

Videos are structured hierarchically:
- Scenes contain one or more segments
- Each scene has a generated starting keyframe (new visual context)
- Segments within a scene chain via extracted frames (seamless continuity)
- Transitions between scenes are applied programmatically (cut, fade, dissolve)

Scene 1 (20 sec target → 3 segments)
├── Keyframe: scene-01-start.png (GENERATED via keyframe-generator)
├── Segment A (8 sec) → extract frame (via segment-generator)
├── Segment B (8 sec) → extract frame (via segment-generator)
└── Segment C (4 sec) (via segment-generator, no extraction)

[TRANSITION: fade/cut/dissolve]

Scene 2 (8 sec target → 1 segment)
├── Keyframe: scene-02-start.png (GENERATED via keyframe-generator)
└── Segment A (8 sec) (via segment-generator)

Why this model:
- Scenes = narrative units (different camera, location, or perspective)
- Segments = technical chunks needed due to 8-second generation limit
- Keyframes generated per scene (not per segment) - establishes visual context
- Transitions between scenes are real cinematic choices (not just frame chaining)

Workflow Phases

Phase 0: Setup Check

1. Navigate to https://labs.google/fx/flow (use MCP directly for initial check)
2. Handle cookie consent if needed
3. Verify login status (look for project list or user avatar)
4. If not logged in, guide user through login
5. Create a new project or use existing one

Note: This phase is done by the main agent directly to verify MCP is working.

Phase 1: Production Philosophy (REQUIRED)

Create both files before proceeding:
- {output_dir}/philosophy.md
- {output_dir}/style.json

philosophy.md template:

# Production Philosophy: [Project Name]

## Visual Identity
- **Art Style**: [e.g., cinematic realistic, anime, painterly]
- **Color Palette**: [primary colors, mood, temperature]
- **Lighting**: [natural, dramatic, soft, high-contrast]
- **Composition**: [rule of thirds, centered, dynamic angles]

## Motion Language
- **Movement Quality**: [smooth/fluid, dynamic/energetic, subtle/minimal]
- **Pacing**: [fast cuts, slow contemplative, rhythmic]
- **Camera Style**: [static, tracking, handheld, cinematic sweeps]

## Subject Consistency
- **Characters/Products**: [detailed descriptions]
- **Environment**: [setting details]
- **Props/Elements**: [recurring visual elements]

## Constraints
- **Avoid**: [unwanted elements]
- **Maintain**: [elements that must stay consistent]

style.json template:

{
  "project_name": "Project Name",
  "visual_style": {
    "art_style": "description",
    "color_palette": "description",
    "lighting": "description",
    "composition": "description"
  },
  "motion_language": {
    "movement_quality": "description",
    "pacing": "description",
    "camera_style": "description"
  },
  "subject_consistency": {
    "main_subject": "detailed description",
    "environment": "detailed description"
  },
  "constraints": {
    "avoid": ["list", "of", "things"],
    "maintain": ["list", "of", "things"]
  }
}

CHECKPOINT: Get user approval before proceeding.

Phase 2: Scene Breakdown (REQUIRED)

Create {output_dir}/scene-breakdown.md:

# Scene Breakdown: [Project Name]

## Overview
- **Total Duration**: [X seconds]
- **Number of Scenes**: [N]
- **Segment Duration**: 8 seconds (Flow Veo limit)
- **Video Type**: [promotional/narrative/educational/etc.]

---

## Scene 1: [Title]
**Duration**: [X seconds] → [ceil(X/8)] segments
**Purpose**: [What this scene communicates]
**Transition to Next**: [cut/fade/dissolve/wipe]

**Starting Keyframe**:
[Detailed visual description for the generated keyframe that starts this scene]

**Segments**:
1. **Seg A** (0-8s): [Motion description for first 8 seconds]
2. **Seg B** (8-16s): [Motion description for next 8 seconds]
3. **Seg C** (16-Xs): [Motion description for remaining seconds]

**Camera**: [static/tracking/pan/zoom/POV]

---

## Scene 2: [Title]
**Duration**: [X seconds] → [ceil(X/8)] segments
**Purpose**: [What this scene communicates]
**Transition to Next**: [null - last scene]

**Starting Keyframe**:
[Detailed visual description - this is a NEW scene so needs its own keyframe]

**Segments**:
1. **Seg A** (0-8s): [Motion description]

**Camera**: [camera style]

---

Planning Guidelines:

When to Create a New Scene
Camera angle/perspective changes significantly
Location or setting changes
Time jump occurs
Subject/focus changes
You want a cinematic transition (fade, dissolve)

When to Add Segments (Same Scene)
Continuous action exceeds 8 seconds
Same camera perspective continues
No narrative break needed

Segment Calculation: segments_needed = ceil(scene_duration / 8)

Scene Duration	Segments Needed
1-8 seconds	1
9-16 seconds	2
17-24 seconds	3
25-32 seconds	4

CHECKPOINT: Get user approval before proceeding.

Phase 3: Pipeline Generation (REQUIRED)

Create {output_dir}/pipeline.json:

Pipeline Schema v3.0:

{
  "version": "3.0",
  "project_name": "project-name",
  "config": {
    "segment_duration": 8
  },
  "metadata": {
    "created_at": "ISO timestamp",
    "philosophy_file": "philosophy.md",
    "style_file": "style.json",
    "scene_breakdown_file": "scene-breakdown.md"
  },
  "assets": {
    "backgrounds": {
      "<id>": {
        "prompt": "Detailed description...",
        "output": "assets/backgrounds/<id>.png",
        "status": "pending"
      }
    },
    "characters": {
      "<id>": {
        "prompt": "Detailed description...",
        "output": "assets/characters/<id>.png",
        "status": "pending"
      }
    }
  },
  "scenes": [
    {
      "id": "scene-01",
      "title": "Scene Title",
      "duration_target": 20,
      "transition_to_next": "cut",
      "first_keyframe": {
        "prompt": "Detailed visual description for scene start...",
        "output": "keyframes/scene-01-start.png",
        "status": "pending"
      },
      "segments": [
        {
          "id": "seg-01-A",
          "motion_prompt": "Motion description for first 8 seconds...",
          "output_video": "scene-01/seg-A.mp4",
          "status": "pending"
        },
        {
          "id": "seg-01-B",
          "motion_prompt": "Continuing motion for next 8 seconds...",
          "output_video": "scene-01/seg-B.mp4",
          "status": "pending"
        },
        {
          "id": "seg-01-C",
          "motion_prompt": "Final motion segment...",
          "output_video": "scene-01/seg-C.mp4",
          "status": "pending"
        }
      ]
    },
    {
      "id": "scene-02",
      "title": "Different Scene",
      "duration_target": 8,
      "transition_to_next": null,
      "first_keyframe": {
        "prompt": "New visual context description...",
        "output": "keyframes/scene-02-start.png",
        "status": "pending"
      },
      "segments": [
        {
          "id": "seg-02-A",
          "motion_prompt": "Motion description...",
          "output_video": "scene-02/seg-A.mp4",
          "status": "pending"
        }
      ]
    }
  ]
}

Schema Notes:
- config.segment_duration: Flow Veo's max video length (8 seconds)
- scenes[].duration_target: Desired scene length → determines segment count: ceil(duration / 8)
- scenes[].transition_to_next: Transition to apply before next scene (cut, fade, dissolve, wipe, or null for last scene)
- scenes[].first_keyframe: Generated image to establish scene's visual context
- scenes[].segments[]: Technical video chunks that chain seamlessly within the scene

Veo Motion Prompt Guidelines (CRITICAL)

Motion prompts for video segments MUST follow this official Veo structure:

Structure: [Cinematography] + [Subject] + [Action] + [Context] + [Style & Ambiance]

Component	Description	Example
Cinematography	Camera movement, shot type, lens	"Slow dolly push forward", "Wide tracking shot", "Close-up with shallow depth of field"
Subject	Main visual focus with details	"A massive silver-blue warship with glowing cyan engines", "A young woman's face"
Action	ONE primary motion/change	"rotates its turrets into firing position", "looks out the window at passing lights"
Context	Setting, environment, surroundings	"against a backdrop of colorful nebula in deep space", "inside a bus at night during a rainstorm"
Style & Ambiance	Mood, lighting, visual quality	"Tense pre-battle atmosphere, dramatic rim lighting, photorealistic cinematic quality"

Prompt Requirements:
- Length: 100-150 words (3-6 sentences)
- ONE action per segment: Don't describe 5 things happening simultaneously
- Specific camera language: Use "dolly", "tracking", "pan", "crane", "push", "pull back" - not vague "camera moves"
- Motion focus: Describe what MOVES, not static descriptions

Good Example:

Slow dolly push forward through a massive fleet formation in deep space. Sleek silver-blue warships with angular hulls drift majestically past camera, their cyan engines pulsing with rhythmic blue light. Turrets on the nearest destroyer slowly rotate into firing position as shield generators flicker to life with crackling blue energy. The ships hold perfect V-formation against a backdrop of distant stars and colorful nebula. Tense pre-battle atmosphere, epic cinematic scale, photorealistic sci-fi with dramatic rim lighting on metal hulls.

Bad Example (too many actions, vague camera):

Battle erupts in full fury. Blue and red laser beams crisscross the frame. A battleship fires broadsides. Explosions ripple. Fighters weave between ships. Debris scatters everywhere. Camera tracks through chaos.

↑ This has 6+ simultaneous actions and "camera tracks through chaos" is vague.

Fixed Version:

Dynamic tracking shot following a massive human battleship as it unleashes a devastating broadside. Blue energy bolts erupt from its flanking cannons, streaking across the void toward an enemy cruiser. Orange explosions bloom on the target's shields, rippling with impact energy. Debris and sparks scatter into space. The battleship's hull fills the foreground, weapon ports flashing in sequence. Intense combat lighting with contrasting blue and orange, chaotic but readable action, cinematic sci-fi blockbuster quality.

Common Mistakes to Avoid:
| Mistake | Problem | Fix |
|---------|---------|-----|
| Multiple simultaneous actions | Veo can't render 5 things at once clearly | Focus on ONE primary action |
| Static descriptions | "Ships in space" describes an image, not video | Add motion: "Ships drift forward, engines pulsing" |
| Vague camera direction | "Camera moves dynamically" | Use specific: "Tracking shot following the ship" |
| Too short (30-50 words) | Lacks detail for quality output | Expand to 100-150 words |
| Missing style/mood | Generic output | Add atmosphere: "Tense, dramatic rim lighting" |

CHECKPOINT: Get user approval before proceeding.

Phase 4: Asset Execution (via sub-agents)

First, read the agent instructions:

Read(".claude/agents/asset-generator.md")

For each asset in pipeline.json, spawn a general-purpose sub-agent with embedded instructions:

Task(
  subagent_type="general-purpose",
  prompt='''
  [Embed full contents of .claude/agents/asset-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "asset_id": "<asset_id>",
    "prompt": "<asset_prompt>",
    "output_path": "<full_output_path>",
    "project_dir": "<project_directory>"
  }
  ''',
  description="Generate <asset_id> asset"
)

Parallel Execution: Assets are independent - spawn all sub-agents in parallel:

# All assets can run simultaneously (embed same instructions in each)
Task(subagent_type="general-purpose", prompt="[asset-generator.md] + asset1 task", description="Generate asset 1")
Task(subagent_type="general-purpose", prompt="[asset-generator.md] + asset2 task", description="Generate asset 2")
Task(subagent_type="general-purpose", prompt="[asset-generator.md] + asset3 task", description="Generate asset 3")

After each sub-agent returns:
1. Parse the returned JSON
2. Update pipeline.json: Set asset's status to "completed" or "error"
3. If error, note it for user review

CHECKPOINT: Review assets (read the image files), get user approval.

Phase 5: Scene Keyframes Generation (via sub-agents)

First, read the agent instructions:

Read(".claude/agents/keyframe-generator.md")

For each scene in pipeline.json, spawn a general-purpose sub-agent with embedded instructions:

Task(
  subagent_type="general-purpose",
  prompt='''
  [Embed full contents of .claude/agents/keyframe-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "scene_id": "<scene_id>",
    "prompt": "<keyframe_prompt>",
    "output_path": "<full_keyframe_path>",
    "project_dir": "<project_directory>",
    "style_context": {
      "art_style": "<from style.json>",
      "color_palette": "<from style.json>",
      "lighting": "<from style.json>"
    }
  }
  ''',
  description="Generate <scene_id> keyframe"
)

Parallel Execution: Keyframes are independent - spawn all in parallel:

Task(subagent_type="general-purpose", prompt="[keyframe-generator.md] + scene-01 task", description="Generate scene-01 keyframe")
Task(subagent_type="general-purpose", prompt="[keyframe-generator.md] + scene-02 task", description="Generate scene-02 keyframe")

After each sub-agent returns:
1. Parse the returned JSON
2. Update pipeline.json: Set scene's first_keyframe.status to "completed" or "error"

CHECKPOINT: Review all scene keyframes (read the image files), get user approval.

Phase 6: Segment Execution (via sub-agents)

First, read the agent instructions:

Read(".claude/agents/segment-generator.md")

For each scene in pipeline.json:
For each segment in scene.segments:

IMPORTANT: Segments within a scene MUST be sequential (frame chaining)

# Scene 1 segments - SEQUENTIAL
seg_A_result = Task(
  subagent_type="general-purpose",
  prompt='''
  [Embed full contents of .claude/agents/segment-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "segment_id": "seg-01-A",
    "scene_id": "scene-01",
    "motion_prompt": "<motion_prompt>",
    "start_frame_path": "<project_dir>/keyframes/scene-01-start.png",
    "output_video_path": "<project_dir>/scene-01/seg-A.mp4",
    "project_dir": "<project_directory>",
    "extract_end_frame": true,
    "end_frame_path": "<project_dir>/scene-01/extracted/after-seg-A.png"
  }
  ''',
  description="Generate seg-01-A"
)

# Wait for seg_A to complete, then use its extracted frame
seg_B_result = Task(
  subagent_type="general-purpose",
  prompt='''
  [Embed full contents of .claude/agents/segment-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "segment_id": "seg-01-B",
    "scene_id": "scene-01",
    "motion_prompt": "<motion_prompt>",
    "start_frame_path": "<project_dir>/scene-01/extracted/after-seg-A.png",
    "output_video_path": "<project_dir>/scene-01/seg-B.mp4",
    "project_dir": "<project_directory>",
    "extract_end_frame": true,
    "end_frame_path": "<project_dir>/scene-01/extracted/after-seg-B.png"
  }
  ''',
  description="Generate seg-01-B"
)

# Last segment - no extraction needed
seg_C_result = Task(
  subagent_type="general-purpose",
  prompt='''
  [Embed full contents of .claude/agents/segment-generator.md here]

  ---
  NOW EXECUTE THIS TASK:
  {
    "segment_id": "seg-01-C",
    "scene_id": "scene-01",
    "motion_prompt": "<motion_prompt>",
    "start_frame_path": "<project_dir>/scene-01/extracted/after-seg-B.png",
    "output_video_path": "<project_dir>/scene-01/seg-C.mp4",
    "project_dir": "<project_directory>",
    "extract_end_frame": false,
    "end_frame_path": null
  }
  ''',
  description="Generate seg-01-C"
)

Cross-Scene Parallelization: Different scenes can run in parallel since they don't share frames:

# Scene 1 and Scene 2 keyframes are ready - both scene segment chains can run in parallel
# (But segments WITHIN each scene must be sequential)

Execution Flow:

Scene 1 (3 segments) - Sequential chain:
  seg-A: start=keyframe → generate → extract frame
  seg-B: start=after-seg-A.png → generate → extract frame
  seg-C: start=after-seg-B.png → generate → (no extraction)

Scene 2 (1 segment) - Can run in parallel with Scene 1:
  seg-A: start=keyframe → generate → (no extraction)

After each sub-agent returns:
1. Parse the returned JSON
2. Update pipeline.json: Set segment's status to "completed" or "error"

CHECKPOINT: Get user approval on videos.

Phase 7: Final Concatenation

This phase is handled by the main agent directly (not sub-agents).

Use the scripts/merge_videos.py script to concatenate videos. This script uses moviepy and handles resolution differences automatically.

Merge Script Location: scripts/merge_videos.py

Usage:

python scripts/merge_videos.py -o <output_file> <input1> <input2> [input3] ...

Step 1: Concatenate segments within each scene (seamless)

For each scene, merge its segments:

# Scene 1: combine segments
cd {output_dir}
python {skill_dir}/scripts/merge_videos.py -o scene-01/scene.mp4 scene-01/seg-A.mp4 scene-01/seg-B.mp4

# Scene 2: combine segments
python {skill_dir}/scripts/merge_videos.py -o scene-02/scene.mp4 scene-02/seg-A.mp4 scene-02/seg-B.mp4 scene-02/seg-C.mp4 scene-02/seg-D.mp4

# Scene 3: combine segments
python {skill_dir}/scripts/merge_videos.py -o scene-03/scene.mp4 scene-03/seg-A.mp4 scene-03/seg-B.mp4

Step 2: Combine all scenes into final video

Merge all scene videos into the final output:

python {skill_dir}/scripts/merge_videos.py -o output.mp4 scene-01/scene.mp4 scene-02/scene.mp4 scene-03/scene.mp4

Script Features:
- Automatically resizes videos to match the first video's resolution
- Handles any number of input videos (2 or more)
- Uses libx264 codec for compatibility
- Returns JSON status with duration and resolution info

Script Options:
| Option | Description |
|--------|-------------|
| -o, --output | Output video file path (required) |
| --codec | Video codec (default: libx264) |
| --fps | Output FPS (default: first video's FPS) |
| --no-resize | Don't resize videos to match first |

Step 3: Clean up and finalize

Remove intermediate files (scene.mp4 per scene) - optional
Update pipeline.json to mark project complete

Final output: {output_dir}/output.mp4

Output Directory Structure

{output_dir}/
├── philosophy.md
├── style.json
├── scene-breakdown.md
├── pipeline.json
├── output.mp4                      <- FINAL VIDEO (with transitions)
├── assets/
│   ├── characters/
│   └── backgrounds/
├── keyframes/
│   ├── scene-01-start.png          <- Generated (scene 1 start)
│   └── scene-02-start.png          <- Generated (scene 2 start)
├── scene-01/
│   ├── seg-A.mp4                   <- Segment videos
│   ├── seg-B.mp4
│   ├── seg-C.mp4
│   ├── scene.mp4                   <- Concatenated scene (intermediate)
│   └── extracted/                  <- Internal extracted frames
│       ├── after-seg-A.png
│       └── after-seg-B.png
├── scene-02/
│   ├── seg-A.mp4
│   └── scene.mp4
└── ...

Key Points:
- keyframes/ contains only generated keyframes (one per scene)
- scene-XX/extracted/ contains extracted frames (internal, for segment chaining)
- scene-XX/scene.mp4 is the intermediate concatenated scene (before transitions)

TodoWrite Template

1. Check MCP Playwright availability
2. Navigate to Flow and verify login
3. Create philosophy.md
4. Create style.json
5. Get user approval on philosophy
6. Create scene-breakdown.md (with scenes and segments)
7. Get user approval on scene breakdown
8. Create pipeline.json (v3.0 with nested segments)
9. Get user approval on pipeline
10. Spawn asset-generator sub-agents (parallel)
11. Update pipeline.json with asset results
12. Review assets, get user approval
13. Spawn keyframe-generator sub-agents (parallel)
14. Update pipeline.json with keyframe results
15. Review keyframes, get user approval
16. Spawn segment-generator sub-agents (sequential per scene, parallel across scenes)
17. Update pipeline.json with segment results
18. Get user approval on videos
19. Concatenate segments within each scene (ffmpeg)
20. Concatenate scenes with transitions into output.mp4
21. Provide final summary

Error Handling

When a sub-agent returns an error:
1. Log the error in pipeline.json (update status to "error", add error message)
2. Inform the user of the failure
3. Offer to retry: spawn a new sub-agent for the failed task
4. Continue with other independent tasks if possible

Technical Specs

Parameter	Value
Segment Duration	8 seconds per generation (Flow Veo limit)
Image Resolution	Up to 1024x1024 (Nano Banana Pro)
Video Resolution	Up to 1080p (4K with AI Ultra)
Rate Limiting	Credits-based (100 credits per Quality video)
GPU Required	None (cloud-based)
Image Model	Nano Banana Pro
Video Model	Veo 3.1 - Quality
Outputs per Prompt	1

Key Terminology:
- Scene = A narrative/cinematic unit (any duration). Represents a continuous shot or distinct visual context. Each scene requires a generated starting keyframe.
- Segment = A technical 8-second video chunk within a scene. Multiple segments chain together seamlessly via extracted frames to form longer scenes.
- Sub-agent = Isolated agent instance spawned via Task tool. Has fresh context, returns result to main agent.

Troubleshooting

Issue	Solution
Sub-agent returns error	Check error message, retry with fresh sub-agent
Rate limited / Out of credits	Wait or upgrade subscription, then retry
Generation stuck	Sub-agent will timeout and return error
File not found	Check .playwright-mcp/ directory manually
Pipeline out of sync	Re-read pipeline.json, update status fields
Not logged in	Guide user to log in at labs.google/fx/flow
Generation mode wrong	Verify correct mode selected in dropdown

# README.md

Flow Video Producer Skill

A Claude Code / OpenCode skill for AI video production using Google Flow via MCP Playwright browser automation. Creates any video type: promotional, educational, narrative, social media, animations, game trailers, music videos, and more.

Example Output

Created with one prompt: "photorealistic battlefield, first person"

Generated a 24-second continuous shot exceeding Flow's 8-second limit. The skill automatically broke down 3 scenes, chained video segments with extracted keyframes for seamless continuity, and concatenated them into a single fluid output.

Installation

/plugin marketplace add zysilm-ai/flow-video-producer-skill

Or manually clone to ~/.claude/skills/ or .claude/skills/

Quick Start

Simply describe what video you want:

You: Create a first-person battlefield experience

Claude: I'll help you create that video. Let me start by establishing
a Production Philosophy and breaking down the scenes...

Claude will:
- Auto-install MCP Playwright if missing
- Navigate to Flow and check login status
- Guide you through the production workflow
- Generate assets, keyframes, and videos with your approval
- Concatenate final output

Prerequisites: Claude Code or OpenCode CLI, Google account with Flow access

Overview

This skill guides you through creating professional AI-generated videos with a structured, iterative workflow:

Production Philosophy - Define visual style, motion language, and narrative approach
Scene Breakdown - Decompose video into scenes with motion requirements
Pipeline Generation - Create detailed prompts for all assets and scenes
Asset Generation - Create backgrounds and character references
Keyframe Generation - Generate the first keyframe to establish visual style
Scene Execution - Generate videos sequentially, extracting last frames for continuity
Final Concatenation - Combine all scene videos into single output

The philosophy-first approach ensures visual coherence across all scenes.

Key Features

Cloud-Based - No GPU required, uses Flow's Veo 3.1 Fast for video, Nano Banana Pro for images
MCP Automation - Claude directly controls browser via MCP Playwright
Self-Healing - Adapts to UI changes through semantic understanding
Video-First Pipeline - Perfect visual continuity between scenes
Zero Setup - MCP Playwright auto-installs if missing, just log in to Google

Architecture

Claude reads pipeline.json
    |
Claude -> MCP Playwright -> Flow Web Interface
    |
Claude updates pipeline.json status
    |
Claude moves downloads to correct output paths
    |
Claude concatenates videos to output.mp4

Benefits:
- Self-healing: Claude adapts to UI changes by semantic understanding
- No brittle CSS selectors that break when Flow updates
- Simpler codebase - no Python Playwright code to maintain
- Real-time adaptation to page state

Supported Video Types

Promotional - Product launches, brand stories, ads
Educational - Tutorials, explainers, courses
Narrative - Short films, animations, music videos
Social Media - Platform-optimized content (TikTok, Reels, Shorts)
Corporate - Demos, presentations, training
Game Trailers - Action sequences, atmosphere, gameplay hints
Immersive - First-person experiences, POV content

Pipeline Modes

Mode	Description	Best For
Video-First (Recommended)	Generate first keyframe only, then videos sequentially. Last frame of each video becomes next scene's start.	Visual continuity between scenes
Keyframe-First	Generate all keyframes independently, then videos between them.	Precise control over end poses

Output Structure

output/{project-name}/
├── philosophy.md              # Production philosophy
├── style.json                 # Style configuration
├── scene-breakdown.md         # Scene plan
├── pipeline.json              # Execution pipeline
├── output.mp4                 # FINAL CONCATENATED VIDEO
├── assets/
│   ├── characters/
│   └── backgrounds/
├── keyframes/
│   ├── KF-A.png              # First keyframe (generated)
│   ├── KF-B.png              # Extracted from scene-01
│   └── KF-C.png              # Extracted from scene-02
├── scene-01/
│   └── video.mp4
├── scene-02/
│   └── video.mp4
└── scene-03/
    └── video.mp4

Technical Specs

Parameter	Value
Video Duration	5-8 seconds per generation
Video Model	Veo 3.1 Fast
Image Model	Nano Banana Pro
Image Resolution	Up to 1024x1024
Video Resolution	Up to 1080p
GPU Required	None (cloud-based)

MCP Tools Used

Tool	Purpose
`browser_navigate`	Go to URL
`browser_snapshot`	Get page accessibility tree
`browser_click`	Click element by ref
`browser_type`	Type text into input
`browser_file_upload`	Upload keyframes
`browser_wait_for`	Wait for generation

Troubleshooting

Issue	Solution
Cookie consent page	Click "Accept all" button
Not logged in	Log in to Google manually in browser
Generation stuck	Wait longer, check snapshot for progress
Download not working	Try clicking download button again
Element ref not found	Take new snapshot, refs change on page update
Rate limited	Wait until quota resets

Directory Structure

flow-video-producer-skill/
├── SKILL.md                   # Claude Code skill instructions
├── README.md                  # This file
├── references/
│   ├── prompt-engineering.md
│   ├── style-systems.md
│   └── troubleshooting.md
└── output/                    # Generated projects
    └── {project-name}/
        ├── pipeline.json
        ├── output.mp4
        └── ...

Contributing

Contributions welcome! Areas for improvement:

Additional video generation backends
Audio generation integration
Batch processing tools
More video styles and templates

License

MIT License - See LICENSE.txt

Acknowledgments

Claude Code - AI coding assistant
OpenCode - Open source AI coding agent
MCP Playwright - Browser automation via MCP
Google Flow - AI video/image generation

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

ai-video-producer

# Description

# SKILL.md

AI Video Producer (MCP Edition)

Architecture Overview

Prerequisites & Setup

Auto-Setup Check

Google Flow Overview

MANDATORY WORKFLOW REQUIREMENTS

Sub-Agent Definitions

asset-generator

keyframe-generator

segment-generator

How to Invoke Sub-Agents

Pipeline Architecture

Scene and Segment Model

Workflow Phases

Phase 0: Setup Check

Phase 1: Production Philosophy (REQUIRED)

Phase 2: Scene Breakdown (REQUIRED)

Phase 3: Pipeline Generation (REQUIRED)

Veo Motion Prompt Guidelines (CRITICAL)

Phase 4: Asset Execution (via sub-agents)

Phase 5: Scene Keyframes Generation (via sub-agents)

Phase 6: Segment Execution (via sub-agents)

Phase 7: Final Concatenation

Output Directory Structure

TodoWrite Template

Error Handling

Technical Specs

Troubleshooting

# README.md

Flow Video Producer Skill

Example Output

Installation

Quick Start

Overview

Key Features

Architecture

Supported Video Types

Pipeline Modes

Output Structure

Technical Specs

MCP Tools Used

Troubleshooting

Directory Structure

Contributing

License

Acknowledgments

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill