Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add FundCore/fundcore-agent-skills --skill "ai-talking-head"
Install specific skill from multi-skill repository
# Description
Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos.
# SKILL.md
name: ai-talking-head
description: "Specialized skill for AI talking head and lip-sync video generation. Use when you need presenter videos, UGC-style content, or lip-synced avatars. Triggers on: talking head, presenter video, lip sync, UGC video. Outputs professional talking head videos."
AI Talking Head
Generate talking head videos, presenter content, and lip-synced videos.
Use this skill when: You need a person (real or AI) talking to camera.
Route here from: ai-creative-workflow, ai-creative-strategist, or direct requests.
Why This Skill Exists
The problem: Talking head videos are the most persuasive content format but:
1. Recording yourself is time-consuming and requires confidence
2. Professional presenters are expensive ($500-5000+ per video)
3. UGC creators charge $100-500 per post and may not match your brand
4. Iterating on scripts means re-filming everything
5. Scaling personalized video is nearly impossible manually
The solution: AI talking heads that:
- Generate professional presenter videos in minutes
- Let you iterate on scripts without re-recording
- Create unlimited variants for A/B testing
- Maintain consistent brand presenter identity
- Scale personalized outreach cost-effectively
The game-changer: Combining avatar generation + lip-sync lets you:
- Create a consistent "brand spokesperson"
- Update any script without re-filming
- Test multiple presenter styles quickly
- Produce video content at 10x the speed
Presenter Style Exploration (Before Generation)
Critical insight from ai-creative-strategist: Don't generate with one style and hope it works. Explore genuinely DIFFERENT presenter styles first.
The Style Exploration Process
STEP 1: GENERATE 4-5 DIFFERENT PRESENTER STYLES
This is NOT: Same person with different clothes
This IS: Fundamentally different presenter archetypes that each tell a different story
[YOUR BRAND] - Style Exploration
Generate presenter concepts for these 5 directions:
1. CORPORATE AUTHORITY
- Demographic: 35-50, professional appearance
- Setting: Modern office, corporate environment
- Wardrobe: Business professional, suit/blazer
- Energy: Confident, measured, authoritative
- Vibe: "Trust the expert"
2. RELATABLE FRIEND
- Demographic: 25-40, approachable look
- Setting: Home office, kitchen, casual space
- Wardrobe: Smart casual, comfortable
- Energy: Warm, conversational, genuine
- Vibe: "Let me share what worked for me"
3. ENERGETIC CREATOR
- Demographic: 22-35, creator aesthetic
- Setting: Ring light setup, content studio
- Wardrobe: Trendy casual, branded
- Energy: High, dynamic, enthusiastic
- Vibe: "You HAVE to try this"
4. EXPERT EDUCATOR
- Demographic: 30-55, credible appearance
- Setting: Study, library, professional backdrop
- Wardrobe: Smart casual, glasses optional
- Energy: Calm, explanatory, helpful
- Vibe: "Let me explain how this works"
5. LIFESTYLE ASPIRATIONAL
- Demographic: 28-45, aspirational look
- Setting: Beautiful home, travel location, luxury
- Wardrobe: Elevated casual, tasteful
- Energy: Relaxed confidence, success aura
- Vibe: "This is what my life looks like"
STEP 2: IDENTIFY WINNER
After generating style exploration:
REVIEW each presenter style:
Which presenter:
- Best matches brand voice?
- Would audience trust most?
- Fits the content type?
- Has right energy level?
- Would work across multiple videos?
WINNER: [Selected style]
BECAUSE: [Why this style wins for this brand/use case]
STEP 3: EXTRACT PRESENTER PRINCIPLES
Once winner identified:
WINNING STYLE EXTRACTION
Demographics:
- Age range: [X-X]
- Gender: [if specific]
- Ethnicity: [if specific]
- Overall look: [descriptors]
Environment:
- Primary setting: [where they present from]
- Background elements: [what's visible]
- Lighting style: [natural/studio/mixed]
Wardrobe:
- Style: [formal/casual/etc.]
- Colors: [palette]
- Accessories: [if any]
Delivery:
- Energy level: [1-10]
- Speaking pace: [slow/medium/fast]
- Hand gestures: [minimal/moderate/expressive]
- Eye contact: [direct to camera always]
Audio:
- Voice tone: [warm/authoritative/energetic]
- Pacing: [conversational/punchy/measured]
STEP 4: APPLY ACROSS CONTENT
Use extracted principles for:
- All future videos maintain consistency
- Same presenter = brand recognition
- Variations in script, not in presenter
Presenter Archetype Deep Dives
Corporate Authority
When to use: B2B, financial services, healthcare, enterprise SaaS, professional services
Visual Formula:
[Man/Woman] in [30s-50s], [silver/dark hair], wearing [tailored blazer/suit],
in [modern glass office/conference room with city view], [warm professional lighting],
[confident composed expression], [seated at desk OR standing with slight lean],
[direct eye contact with camera], [subtle hand gestures], corporate executive style
Setting Options:
- Corner office with city view
- Modern conference room
- Executive desk with minimal decor
- Standing at presentation screen
- Seated in designer chair
Wardrobe Options:
- Tailored navy blazer over white shirt
- Grey suit, no tie (modern)
- Classic suit with subtle tie
- Blazer over turtleneck (thought leader)
- Professional dress (solid colors)
Energy Markers:
- Measured pace
- Deliberate movements
- Confident pauses
- Minimal but purposeful gestures
- Assured vocal tone
Relatable Friend (UGC Style)
When to use: DTC brands, consumer products, wellness, beauty, lifestyle
Visual Formula:
[Friendly man/woman] in [25-40s], wearing [casual but put-together outfit],
in [bright modern apartment/kitchen/home office], [natural window light],
[genuine warm smile], [relaxed comfortable posture], [talking to camera like
a friend], [natural hand movements], authentic UGC creator style
Setting Options:
- Bright kitchen counter
- Cozy living room couch
- Home office with plants
- Bedroom getting-ready setup
- Outdoor patio/balcony
Wardrobe Options:
- Cozy sweater/cardigan
- Simple t-shirt
- Casual button-down
- Loungewear (if brand appropriate)
- Athleisure
Energy Markers:
- Conversational rhythm
- Natural pauses ("honestly?", "okay so...")
- Expressive facial reactions
- Genuine enthusiasm without over-selling
- Relatable body language
UGC Script Patterns:
DISCOVERY: "Okay so I found this [product] and I'm obsessed..."
REVIEW: "So I've been using [product] for [time] and here's my honest take..."
COMPARISON: "I used to use [old product] but then I tried [new product]..."
TRANSFORMATION: "Before [product] I was [problem]. Now? [result]."
Energetic Creator
When to use: Gen-Z products, entertainment, gaming, trendy DTC, social apps
Visual Formula:
[Young energetic creator] in [22-35], [colorful trendy outfit], in [content
studio with ring light/neon lights], [bright dynamic lighting], [animated
expressions], [lots of movement and gestures], [high energy delivery],
[fast-paced enthusiastic style], YouTube/TikTok creator aesthetic
Setting Options:
- Ring light setup visible
- LED/neon accent lighting
- Streaming/gaming setup
- Colorful backdrop
- Outdoor action setting
Wardrobe Options:
- Graphic tees
- Bold colors
- Branded merch
- Trendy streetwear
- Statement accessories
Energy Markers:
- Fast-paced delivery
- Big expressions
- Lots of hand movement
- Pattern interrupts
- Enthusiasm at 10
Creator Script Patterns:
HOOK: "STOP scrolling. This is important."
REVEAL: "I literally just discovered [thing] and I'm freaking out."
CHALLENGE: "I bet you can't guess what [product] does."
REACTION: "[reaction to trying product]... WAIT what?!"
Expert Educator
When to use: Online courses, professional services, B2B explainers, tutorials
Visual Formula:
[Knowledgeable expert] in [30s-55], [smart casual or academic style],
in [home study/office with books/whiteboard], [balanced lighting],
[thoughtful composed expression], [explaining with purposeful gestures],
[patient instructive tone], educator/thought leader style
Setting Options:
- Study with bookshelves
- Office with credentials visible
- Whiteboard/screen behind
- Standing at presentation
- Desk with relevant props
Wardrobe Options:
- Button-down shirt
- Blazer over casual shirt
- Sweater over collared shirt
- Glasses (authority signal)
- Minimal accessories
Energy Markers:
- Patient pace
- Teaching rhythm
- Logical structure
- Illustrative gestures
- "Here's what matters" moments
Lifestyle Aspirational
When to use: Luxury brands, high-ticket services, aspirational DTC, travel, real estate
Visual Formula:
[Elegant successful person] in [30s-50s], [elevated casual attire],
in [beautiful interior/scenic location], [golden hour OR designer lighting],
[relaxed confident demeanor], [speaking with quiet confidence], [minimal
but graceful movement], aspirational lifestyle aesthetic
Setting Options:
- Designer living room
- Travel location (balcony view)
- Luxury car interior
- High-end restaurant/hotel
- Yacht/beach/resort
Wardrobe Options:
- Designer casual
- Linen/natural fabrics
- Neutral luxury palette
- Subtle jewelry/watch
- Effortlessly elegant
Energy Markers:
- Relaxed confidence
- No rushing
- "I have time" energy
- Subtle smile
- Quiet success vibes
Video Model Roster (Quality Winners)
Generate presenter videos with ALL THREE models, present outputs for selection:
| Model | Owner | Speed | Strengths |
|---|---|---|---|
| Sora 2 | openai | ~80s | Excellent general quality, good faces |
| Veo 3.1 | ~130s | Native audio generation, natural movement | |
| Kling v2.5 Turbo Pro | kwaivgi | ~155s | Best for people/motion, most realistic |
Strategy: Run same prompt through all 3 models β User picks best output.
Model Selection Guide
FOR MAXIMUM REALISM (people quality):
β Kling v2.5 Turbo Pro (best faces, most natural movement)
FOR SPEED + QUALITY BALANCE:
β Sora 2 (fastest, still good quality)
FOR BUILT-IN AUDIO:
β Veo 3.1 (generates audio with video)
FOR UGC AUTHENTICITY:
β Kling v2.5 (handles casual movements well)
FOR CORPORATE/FORMAL:
β Sora 2 or Kling v2.5 (cleaner, more controlled)
Lip-Sync Model
For adding speech to existing videos:
| Model | Use | Cost | Speed | Quality |
|---|---|---|---|---|
| Kling Lip-Sync | Add voiceover to any video | ~$0.20 | ~1min | Excellent |
When to use Lip-Sync:
- You have a great presenter video but need different script
- Client wants to change messaging after video generation
- Creating personalized versions of same base video
- Adding voiceover to product demo videos
- Dubbing content for different languages
Use Cases Deep Dive
1. Lip-Sync Overlay
Best for: Adding voiceover to existing video, dubbing, personalization
Input Requirements:
- Video with visible face (front-facing works best)
- Audio file (MP3, WAV) OR text script
Workflow:
{
"model_owner": "kwaivgi",
"model_name": "kling-lip-sync",
"Prefer": "wait",
"input": {
"video": "https://... (source video URL)",
"audio": "https://... (audio file URL)"
}
}
Or with text (uses built-in TTS):
{
"input": {
"video": "https://... (source video URL)",
"text": "Script text to speak"
}
}
Quality Tips:
- Source video should have face visible 70%+ of time
- Forward-facing shots work better than profiles
- Avoid videos with heavy face movement/turning
- Audio should be clear without background noise
- Script pacing should match natural speech
2. AI Presenter Generation
Best for: Creating presenter content from scratch, brand spokesperson
Multi-Model Workflow:
// Sora 2
{
"model_owner": "openai",
"model_name": "sora-2",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"duration": 5
}
}
// Veo 3.1 (with native audio)
{
"model_owner": "google",
"model_name": "veo-3.1",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"generate_audio": true
}
}
// Kling v2.5
{
"model_owner": "kwaivgi",
"model_name": "kling-v2.5-turbo-pro",
"input": {
"prompt": "[presenter prompt]",
"aspect_ratio": "16:9",
"duration": 5
}
}
Then add lip-sync if specific script needed:
{
"model_owner": "kwaivgi",
"model_name": "kling-lip-sync",
"input": {
"video": "[generated video URL]",
"text": "[script text]"
}
}
3. UGC-Style Content
Best for: Authentic testimonials, product reviews, social proof
The UGC Formula:
[Relatable person] + [Casual setting] + [Natural lighting] +
[Authentic delivery] + [Genuine reaction] = Believable UGC
Prompt Template:
Friendly [demographic] sitting in [casual setting], natural window light,
holding/showing [product], genuine excited expression, talking directly to
camera like filming a selfie video, authentic UGC testimonial style, casual
comfortable body language, 5 seconds
UGC Authenticity Markers:
- Slightly imperfect framing
- Natural lighting (not studio)
- Casual wardrobe
- Real reactions, not posed
- Personal space as backdrop
- Eye contact with camera
4. Personal Brand Series
Best for: Thought leaders, course creators, coaches, consultants
Consistency Formula:
ESTABLISH ONCE, USE FOREVER:
- Same presenter appearance
- Same setting/background
- Same wardrobe style
- Same energy level
- Same lighting setup
Only change: Script and specific content
Series Prompt Template:
[Consistent presenter description - use same each time], [same setting],
[same lighting], [same wardrobe style], [same energy], discussing [new topic],
[consistent delivery style], 5 seconds
Script Mastery
Duration Calculation
| Word Count | Duration | Use Case |
|---|---|---|
| 15 words | ~5 seconds | Social hook |
| 30 words | ~10 seconds | Instagram Reel |
| 45 words | ~15 seconds | TikTok optimal |
| 60 words | ~20 seconds | Short testimonial |
| 90 words | ~30 seconds | Product explainer |
| 150 words | ~60 seconds | Full testimonial |
Rule: ~150 words per minute at natural conversational pace
Script Structures
HOOK-VALUE-CTA (15-30 seconds):
Hook (0-3 sec): [Attention-grabber - question, statement, or pattern interrupt]
Value (3-20 sec): [Main message, benefit, or story]
CTA (20-30 sec): [Clear next step]
PROBLEM-AGITATE-SOLVE (30-60 seconds):
Problem (0-10 sec): [Name the pain point]
Agitate (10-30 sec): [Make them feel it]
Solve (30-60 sec): [Present the solution + CTA]
BEFORE-AFTER (15-30 seconds):
Before (0-10 sec): [Life before product/solution]
After (10-25 sec): [Transformation/result]
CTA (25-30 sec): [How to get same result]
Tone Templates
Professional/Corporate:
"[Name] here with [Company]. Today I want to share how [product/insight]
can help you [achieve outcome]. Here's what you need to know..."
Casual/UGC:
"Okay so I've been using [product] for [time] and honestly? I'm obsessed.
Here's why [specific benefit]. If you [problem], you need this."
Expert/Educational:
"One thing I see people get wrong about [topic] is [misconception].
Here's what actually works: [insight]. Let me show you..."
Energetic/Sales:
"Stop what you're doing. [Product] just changed everything. I'm serious -
[result] in [timeframe]. You HAVE to try this."
Aspirational:
"[Casual opening]. I wanted to share something that's completely transformed
[area of life]. [Product] gave me [result]. Here's how it works..."
Platform-Specific Optimization
TikTok/Reels (9:16)
Specs:
- Aspect Ratio: 9:16 (vertical)
- Duration: 15-30 seconds optimal
- Safe Zone: Keep face/text center 60%
Style Adjustments:
β Higher energy delivery
β Faster pacing
β Hook in first 1-2 seconds
β Pattern interrupts
β Jump cuts acceptable
β Casual/authentic feel
Prompt Modifier:
...[base prompt], filmed vertically like TikTok/Reels content,
energetic creator style, direct eye contact with camera
YouTube (16:9)
Specs:
- Aspect Ratio: 16:9 (landscape)
- Duration: 30-120 seconds
- Safe Zone: Standard letterbox
Style Adjustments:
β More measured pacing
β Can be longer form
β More professional setups accepted
β Room for B-roll integration
β Intro/outro structure
Prompt Modifier:
...[base prompt], widescreen YouTube style, professional yet engaging,
room for graphics/lower thirds
LinkedIn (1:1 or 16:9)
Specs:
- Aspect Ratio: 1:1 (square) or 16:9
- Duration: 30-60 seconds optimal
- Tone: Professional but personal
Style Adjustments:
β Professional appearance
β Business-appropriate setting
β Thought leadership tone
β Value-first messaging
β Credibility signals
Prompt Modifier:
...[base prompt], professional LinkedIn style, credible expert appearance,
business casual in modern office environment
Instagram Stories (9:16)
Specs:
- Aspect Ratio: 9:16
- Duration: 15 seconds max per segment
- Ephemeral feel
Style Adjustments:
β Casual, in-the-moment feel
β Can be "rougher" quality
β Direct audience address
β Personal/behind-scenes vibe
β Clear single message per story
Ads (Various)
Facebook/Instagram Ads:
- 1:1, 4:5, or 9:16
- 15-30 second optimal
- Hook in 0-3 seconds
- Clear CTA
YouTube Ads:
- 16:9
- 15-30 second (skippable) or 6 second (bumper)
- Brand visible throughout
Audio & Voice Considerations
When Using Veo 3.1 Native Audio
Strengths:
- Generates synchronized audio with video
- Natural ambient sounds
- Speech that matches lip movement
- Good for establishing scenes
Limitations:
- Less control over specific script
- Audio quality varies
- May need post-processing
When Adding Lip-Sync
Best Practices:
- Use high-quality audio recording
- Match energy level to video presenter
- Pace script to natural speaking rhythm
- Allow for breath pauses
- Keep sentences short (easier sync)
Voice-Over Tips
If recording your own VO for lip-sync:
β‘ Record in quiet environment
β‘ Use consistent distance from mic
β‘ Match energy to presenter style
β‘ Natural pauses between sentences
β‘ Clear enunciation
β‘ Export as MP3 or WAV
If using TTS (text input):
β‘ Use punctuation for natural pauses
β‘ Write phonetically for tricky words
β‘ Keep sentences conversational length
β‘ Test different phrasings
β‘ Consider adding "..." for pauses
Execution Workflow
Step 1: Clarify Requirements
Before generating:
β‘ What's the use case? (UGC, corporate, educational, etc.)
β‘ What platform? (TikTok, YouTube, LinkedIn, ads)
β‘ What aspect ratio? (9:16, 16:9, 1:1)
β‘ What duration? (and word count)
β‘ What presenter style? (see archetypes)
β‘ What's the script/message?
β‘ Need lip-sync to specific audio?
Step 2: Style Selection
If not predefined:
β‘ Generate style exploration with 4-5 different presenter styles
β‘ Present options to user
β‘ Extract principles from winner
β‘ Document for consistency
Step 3: Construct Prompt
Use this formula:
[PRESENTER DESCRIPTION] + [SETTING] + [LIGHTING] +
[EXPRESSION/ENERGY] + [ACTION] + [STYLE MODIFIER] + [DURATION]
Step 4: Multi-Model Generation
Run same prompt through:
1. Sora 2 (~80s)
2. Veo 3.1 (~130s)
3. Kling v2.5 (~155s)
Present all three to user for selection.
Step 5: Add Lip-Sync (If Needed)
If specific script delivery required:
1. User approves video from Step 4
2. Run through Kling Lip-Sync
3. Input: selected video + audio/text
4. Output: synced talking head
Step 6: Deliver & Iterate
## Talking Head Video Options
**Style:** [Archetype used]
**Platform:** [Target platform]
**Duration:** [X seconds]
### Option 1: Sora 2
[video URL]
Notes: [quality assessment]
### Option 2: Veo 3.1 (with audio)
[video URL]
Notes: [quality assessment]
### Option 3: Kling v2.5
[video URL]
Notes: [quality assessment]
**Select preferred video for lip-sync or final delivery.**
Quality Checklist
Technical Quality
- [ ] Face clearly visible throughout
- [ ] No uncanny valley artifacts
- [ ] Consistent appearance (no morphing)
- [ ] Smooth natural movement
- [ ] Appropriate resolution for platform
Presenter Quality
- [ ] Matches intended archetype
- [ ] Expression appropriate for message
- [ ] Energy level fits content type
- [ ] Wardrobe matches brand/context
- [ ] Setting supports message
Lip-Sync Quality (if applicable)
- [ ] Mouth movement matches audio
- [ ] Natural speech rhythm
- [ ] No obvious desync
- [ ] Head movement doesn't break sync
- [ ] Audio quality clear
Content Quality
- [ ] Script delivered clearly
- [ ] Pacing appropriate for platform
- [ ] Hook captures attention
- [ ] Message comes through
- [ ] CTA clear (if applicable)
Common Issues & Solutions
| Issue | Cause | Solution |
|---|---|---|
| Uncanny valley feel | Model limitations | Use Kling v2.5 for most realistic faces |
| Face morphing mid-video | Long duration | Keep videos shorter (5-10 sec), extend with cuts |
| Lip-sync drift | Audio/video mismatch | Use shorter scripts, clear enunciation |
| Wrong energy level | Prompt too vague | Be explicit about energy: "calm" vs "enthusiastic" |
| Generic stock presenter | No specific direction | Add detailed demographic and style descriptors |
| Setting doesn't match | Prompt conflict | Prioritize setting description, remove conflicts |
| Awkward hand movement | Unspecified gestures | Add gesture direction or specify "minimal movement" |
| Bad lighting | Missing lighting prompt | Always include lighting: "warm natural light" |
| Doesn't look like brand | No style consistency | Create and use presenter spec document |
| Audio quality poor | TTS limitations | Use recorded audio instead of text input |
Output Format
Style Exploration Output
## Presenter Style Exploration
**Brand/Project:** [Name]
**Use Case:** [What videos will be used for]
### Style 1: Corporate Authority
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]
### Style 2: Relatable Friend
[video URL or generation]
- Demographic: [specifics]
- Setting: [description]
- Energy: [level]
[...continue for all 5 styles...]
**Recommendation:** Style [X] best fits because [reasons]
**Feedback needed:** Which direction resonates?
Generated Video Output
## Talking Head Video Generated
**Style:** [Archetype]
**Platform:** [Target]
**Duration:** [X seconds]
### Model Outputs:
**Sora 2:** [URL]
**Veo 3.1:** [URL] (includes audio)
**Kling v2.5:** [URL]
**Prompt Used:**
> [full prompt for reference]
**Next Steps:**
- [ ] Select preferred video
- [ ] Add lip-sync to specific script (if needed)
- [ ] Request variation
- [ ] Approve for use
Lip-Sync Output
## Lip-Sync Video Delivered
**Source Video:** [URL]
**Script:** "[excerpt...]"
**Duration:** [X seconds]
**Final Video:** [URL]
**Quality Check:**
- β Sync accuracy
- β Natural rhythm
- β Audio clarity
- β Expression match
**Options:**
- [ ] Approve and use
- [ ] Adjust script and resync
- [ ] Try different source video
Pipeline Integration
TALKING HEAD PIPELINE
βββββββββββββββββββββββββββββββββββββββββββ
β Request arrives (direct or routed) β
β β Clarify: platform, duration, style β
β β Determine: generation vs lip-sync β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββ΄ββββββββββββ
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Style Undefined β β Style Defined β
β β Run style β β β Skip to β
β exploration β β generation β
ββββββββββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β ai-talking-head (THIS SKILL) β
β β Multi-model generation β
β β Present options β
β β Add lip-sync if needed β
β β Quality check β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Delivery β
β β Platform-optimized output β
β β Ready for ads/social/content β
βββββββββββββββββββββββββββββββββββββββββββ
Handoff Protocols
Receiving from ai-creative-workflow
Receive:
use_case: "talking head" | "UGC" | "presenter" | "lip-sync"
platform: "[target platform]"
aspect_ratio: "[ratio]"
duration: "[seconds]"
style: "[archetype or custom]"
script: "[text]"
audio_url: "[if lip-sync with audio]"
video_url: "[if lip-sync to existing]"
Returning to Workflow
Return:
status: "complete" | "needs_selection" | "needs_iteration"
deliverables:
- video_url: "[URL]"
model: "[which model]"
has_audio: true | false
duration: "[seconds]"
feedback_needed: "[any questions]"
Receiving Video from ai-product-video
Receive for lip-sync:
video_url: "[product video URL]"
aspect_ratio: "[ratio]"
script: "[voiceover text]"
audio_url: "[optional, if pre-recorded]"
Tips from Experience
What Works
- Consistency beats variety β Same presenter across videos builds recognition
- Kling v2.5 for faces β Most realistic human generation
- Shorter is safer β 5-10 second clips avoid quality degradation
- Explicit energy levels β "calm and measured" vs "enthusiastic and dynamic"
- Multi-model approach β Always generate with 2-3 models, let user pick
- Lip-sync extends value β One good video can become many scripts
What Doesn't Work
- Vague presenter description β "A person talking" = generic results
- Long continuous takes β Quality degrades after 10-15 seconds
- Ignoring setting β Presenter without context looks artificial
- Skipping style exploration β First idea rarely best for brand
- Mismatched energy β Corporate script + UGC style = awkward
- Complex movements β Walking + talking + gesturing = artifacts
The 80/20
80% of talking head success comes from:
1. Clear presenter archetype selection
2. Matching energy to platform
3. Short, punchy scripts
4. Using Kling v2.5 for realism
Get these four right, and you'll get good results.
Quick Reference
| Task | Model | Process |
|---|---|---|
| Generate presenter video | All 3 models | Multi-model, user picks |
| Add speech to existing video | Kling Lip-Sync | Direct, ~1min |
| Presenter + specific script | Generate β Lip-Sync | Two-step |
| Video with built-in audio | Veo 3.1 | Single generation |
| Most realistic face | Kling v2.5 | Single or multi-model |
| Fastest generation | Sora 2 | Single generation |
| UGC style | Kling v2.5 | Handles casual movement best |
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.