Use when you have a written implementation plan to execute in a separate session with review checkpoints
12
4
# Install this skill:
npx skills add MatrixReligio/ProductVideoCreator --skill "voiceover"
Install specific skill from multi-skill repository
# Description
使用 edge-tts 生成多语言配音(中文/英文)。当需要为视频生成语音旁白、基于时间线同步配音时使用。支持语速调整、多种声音选择和配音验证。
# SKILL.md
name: voiceover
description: 使用 edge-tts 生成多语言配音(中文/英文)。当需要为视频生成语音旁白、基于时间线同步配音时使用。支持语速调整、多种声音选择和配音验证。
argument-hint: [配音文案或时间线文件] [语言: zh/en]
配音生成技能
技术选型
| 方案 | 优点 | 缺点 |
|---|---|---|
| edge-tts | 免费、音质好、多语言支持 | 需要网络 |
| Azure TTS | 更多声音选择、更稳定 | 需要付费 |
推荐: edge-tts
声音选择
中文声音列表 (zh-CN)
| 声音 ID | 性别 | 风格 | 适用场景 |
|---|---|---|---|
| zh-CN-XiaoxiaoNeural | 女 | 温暖亲切 | 产品介绍、教程 |
| zh-CN-YunxiNeural | 男 | 专业稳重 | 企业宣传、正式场合 |
| zh-CN-YunjianNeural | 男 | 激情活力 | 科技发布、激励视频 |
| zh-CN-XiaoyiNeural | 女 | 年轻活泼 | 创意内容、轻松主题 |
| zh-CN-YunyangNeural | 男 | 新闻播报 | 资讯类、严肃主题 |
英文声音列表 (en-US)
| 声音 ID | 性别 | 风格 | 适用场景 |
|---|---|---|---|
| en-US-GuyNeural | 男 | 专业稳重 | 企业宣传、产品介绍 |
| en-US-JennyNeural | 女 | 温暖友好 | 教程、客户服务 |
| en-US-AriaNeural | 女 | 清晰专业 | 新闻、正式场合 |
| en-US-DavisNeural | 男 | 年轻活力 | 科技内容、创意视频 |
| en-US-JasonNeural | 男 | 激情活力 | 发布会、激励视频 |
| en-US-SaraNeural | 女 | 年轻活泼 | 社交媒体、轻松主题 |
声音选择建议
中文视频:
| 视频类型 | 推荐声音 |
|---|---|
| 产品演示 | XiaoxiaoNeural (女) |
| 公司介绍 | YunxiNeural (男) 或 XiaoxiaoNeural |
| 科技历程 | YunjianNeural (男) - 有激情感 |
| 教程类 | XiaoxiaoNeural (女) |
| 发布会风格 | YunjianNeural (男) |
英文视频:
| 视频类型 | 推荐声音 |
|---|---|
| 产品演示 | GuyNeural (男) 或 JennyNeural (女) |
| 公司介绍 | GuyNeural (男) 或 AriaNeural (女) |
| 科技历程 | JasonNeural (男) - 有激情感 |
| 教程类 | JennyNeural (女) |
| 发布会风格 | JasonNeural (男) 或 DavisNeural (男) |
时间线计算
核心公式
# 最终视频时间 = 录屏事件时间 + 偏移量
DEMO_START = OPENING_DURATION + FEATURES_DURATION
final_time = recording_time + DEMO_START
图文视频时间线
对于图文展示型视频,时间线由分镜直接定义:
# 场景时间配置
SCENES = {
"opening": {"start": 0, "duration": 8},
"scene_1": {"start": 8, "duration": 14},
"scene_2": {"start": 22, "duration": 16},
# ...
}
# 配音定义 - 直接基于场景时间
VOICEOVER_SEGMENTS = [
(0.5, 7.5, "片头配音..."), # 场景 opening 内
(8.5, 21.5, "场景1配音..."), # 场景 scene_1 内
(22.5, 37.5, "场景2配音..."), # 场景 scene_2 内
]
配音验证机制 (重要)
自动化验证函数
def validate_voiceover(segments, total_duration):
"""
验证配音时间线
返回: (是否通过, 问题列表)
"""
issues = []
for i, seg in enumerate(segments):
# 检查1: 配音是否超出场景时长
actual_end = seg["start_time"] + seg["actual_duration"]
if seg["actual_duration"] > seg["target_duration"] + 0.5:
issues.append({
"type": "duration_exceeded",
"segment": i,
"message": f"片段{i}: 实际({seg['actual_duration']:.1f}s) > 目标({seg['target_duration']:.1f}s)",
"severity": "warning"
})
# 检查2: 配音是否与下一段重叠
if i < len(segments) - 1:
next_start = segments[i+1]["start_time"]
if actual_end > next_start:
issues.append({
"type": "overlap",
"segment": i,
"message": f"片段{i}和{i+1}重叠: {actual_end:.1f}s > {next_start:.1f}s",
"severity": "error"
})
# 检查3: 最后一段是否超出视频时长
last_seg = segments[-1]
last_end = last_seg["start_time"] + last_seg["actual_duration"]
if last_end > total_duration + 1:
issues.append({
"type": "exceeds_video",
"message": f"配音结束({last_end:.1f}s) > 视频时长({total_duration}s)",
"severity": "error"
})
# 检查4: 空白间隙
for i in range(len(segments) - 1):
current_end = segments[i]["start_time"] + segments[i]["actual_duration"]
next_start = segments[i+1]["start_time"]
gap = next_start - current_end
if gap > 3:
issues.append({
"type": "large_gap",
"segment": i,
"message": f"片段{i}和{i+1}之间有{gap:.1f}s空白",
"severity": "warning"
})
passed = not any(issue["severity"] == "error" for issue in issues)
return passed, issues
验证报告输出
def print_validation_report(segments, total_duration):
"""打印配音验证报告"""
passed, issues = validate_voiceover(segments, total_duration)
print("╔" + "═" * 58 + "╗")
print("║" + "配音验证报告".center(54) + "║")
print("╠" + "═" * 58 + "╣")
print("║ 片段 │ 开始 │ 目标时长 │ 实际时长 │ 状态 ║")
print("╠" + "═" * 58 + "╣")
for i, seg in enumerate(segments):
status = "✅ OK" if seg["actual_duration"] <= seg["target_duration"] + 0.5 else "⚠️ 超时"
print(f"║ {i:2d} │ {seg['start_time']:5.1f}s │ {seg['target_duration']:5.1f}s │ {seg['actual_duration']:5.1f}s │ {status:10s} ║")
print("╠" + "═" * 58 + "╣")
if passed:
print("║ ✅ 验证通过 ║")
else:
print("║ ❌ 验证失败,请检查以下问题: ║")
for issue in issues:
if issue["severity"] == "error":
print(f"║ ❌ {issue['message'][:50]:50s} ║")
print("╚" + "═" * 58 + "╝")
return passed
完整配音脚本模板 (V2 - 多语言版)
#!/usr/bin/env python3
"""
配音生成脚本 V2 - 包含验证机制 + 多语言支持
"""
import asyncio
import subprocess
from pathlib import Path
import json
import re
# ========== 配置 ==========
LANGUAGE = "zh" # "zh" 或 "en"
VOICE = "zh-CN-YunjianNeural" # 科技感男声
# 英文示例: VOICE = "en-US-JasonNeural"
OUTPUT_DIR = Path("public/audio")
TOTAL_DURATION = 85 # 视频总时长
# 配音段落定义 (开始时间, 结束时间, 配音文字)
VOICEOVER_SEGMENTS = [
(0.5, 7.5, "配音内容1"),
(8.5, 21.5, "配音内容2"),
# ...
]
# ========== 工具函数 ==========
def get_audio_duration(file_path):
"""获取音频时长"""
result = subprocess.run(
["ffprobe", "-v", "error", "-show_entries", "format=duration",
"-of", "default=noprint_wrappers=1:nokey=1", str(file_path)],
capture_output=True, text=True
)
return float(result.stdout.strip())
def validate_voiceover(segments, total_duration):
"""验证配音时间线"""
issues = []
for i, seg in enumerate(segments):
# 检查时长
if seg["actual_duration"] > seg["target_duration"] + 0.5:
issues.append(f"⚠️ 片段{i}: 超时 {seg['actual_duration'] - seg['target_duration']:.1f}s")
# 检查重叠
if i < len(segments) - 1:
actual_end = seg["start_time"] + seg["actual_duration"]
next_start = segments[i+1]["start_time"]
if actual_end > next_start:
issues.append(f"❌ 片段{i}和{i+1}重叠")
return len([i for i in issues if i.startswith("❌")]) == 0, issues
# ========== 多语言支持函数 ==========
def calculate_natural_duration(text, language):
"""计算文本自然朗读时长"""
if language == "zh":
# 中文: 约 4 字/秒
char_count = len(re.sub(r'[^\u4e00-\u9fff]', '', text))
return char_count / 4.0
else:
# 英文: 约 150 词/分钟 = 2.5 词/秒
word_count = len(text.split())
return word_count / 2.5
# ========== 生成函数 ==========
async def generate_segment(index, start, end, text):
"""生成单个配音片段(支持多语言)"""
import edge_tts
output_file = OUTPUT_DIR / f"vo_{index:02d}.mp3"
duration_target = end - start
# 计算语速(多语言支持)
natural_duration = calculate_natural_duration(text, LANGUAGE)
if natural_duration > duration_target:
rate_adjust = min(35, int((natural_duration / duration_target - 1) * 100))
rate = f"+{rate_adjust}%"
elif natural_duration < duration_target * 0.7:
rate_adjust = min(15, int((1 - natural_duration / duration_target) * 50))
rate = f"-{rate_adjust}%"
else:
rate = "+0%"
# 生成配音
communicate = edge_tts.Communicate(text=text, voice=VOICE, rate=rate)
await communicate.save(str(output_file))
actual_duration = get_audio_duration(output_file)
return {
"index": index,
"file": output_file.name,
"start_time": start,
"target_duration": duration_target,
"actual_duration": actual_duration,
"text": text[:20] + "...",
"rate": rate,
"language": LANGUAGE,
}
def merge_audio(segments):
"""合并音频"""
filter_parts = []
inputs = []
for i, seg in enumerate(segments):
inputs.extend(["-i", str(OUTPUT_DIR / seg["file"])])
delay_ms = int(seg["start_time"] * 1000)
filter_parts.append(f"[{i}:a]adelay={delay_ms}|{delay_ms}[a{i}];")
mix_inputs = "".join([f"[a{i}]" for i in range(len(segments))])
filter_parts.append(f"{mix_inputs}amix=inputs={len(segments)}:duration=longest[out]")
output_file = OUTPUT_DIR / "synced_voiceover.mp3"
subprocess.run([
"ffmpeg", "-y", *inputs,
"-filter_complex", "".join(filter_parts),
"-map", "[out]",
"-t", str(TOTAL_DURATION),
str(output_file)
], capture_output=True)
return output_file
# ========== 主函数 ==========
async def main():
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
print("=" * 50)
print(f"配音生成 - 声音: {VOICE}")
print("=" * 50)
# 1. 生成配音
print("\n[1/3] 生成配音片段...")
segments = []
for i, (start, end, text) in enumerate(VOICEOVER_SEGMENTS):
seg = await generate_segment(i, start, end, text)
segments.append(seg)
print(f" ✓ 片段{i}: {seg['actual_duration']:.1f}s (目标: {seg['target_duration']:.1f}s) 语速: {seg['rate']}")
# 2. 验证
print("\n[2/3] 验证配音...")
passed, issues = validate_voiceover(segments, TOTAL_DURATION)
if issues:
for issue in issues:
print(f" {issue}")
if not passed:
print("\n❌ 验证失败,请检查配音时间线")
return
else:
print(" ✅ 验证通过")
# 3. 合并
print("\n[3/3] 合并音频...")
output = merge_audio(segments)
final_duration = get_audio_duration(output)
print(f" ✓ 输出: {output}")
print(f" ✓ 时长: {final_duration:.1f}s")
# 保存元数据
metadata = {
"voice": VOICE,
"total_duration": TOTAL_DURATION,
"segments": segments,
}
with open(OUTPUT_DIR / "voiceover_metadata.json", "w", encoding="utf-8") as f:
json.dump(metadata, f, ensure_ascii=False, indent=2)
print("\n" + "=" * 50)
print("配音生成完成!")
print("=" * 50)
if __name__ == "__main__":
asyncio.run(main())
语速控制
多语言语速计算
def calculate_natural_duration(text, language="zh"):
"""计算文本自然朗读时长"""
import re
if language == "zh":
# 中文: 约 4 字/秒
char_count = len(re.sub(r'[^\u4e00-\u9fff]', '', text))
return char_count / 4.0
else:
# 英文: 约 150 词/分钟 = 2.5 词/秒
word_count = len(text.split())
return word_count / 2.5
def calculate_rate(text, target_duration, language="zh"):
"""计算语速调整"""
natural_duration = calculate_natural_duration(text, language)
if natural_duration > target_duration:
adjustment = min(35, int((natural_duration / target_duration - 1) * 100))
return f"+{adjustment}%"
elif natural_duration < target_duration * 0.7:
adjustment = min(15, int((1 - natural_duration / target_duration) * 50))
return f"-{adjustment}%"
return "+0%"
语速参考
| 语言 | 自然语速 | 最快可调 | 最慢可调 |
|---|---|---|---|
| 中文 | 4 字/秒 | +35% (5.4字/秒) | -15% (3.4字/秒) |
| 英文 | 150 词/分 | +35% (200词/分) | -15% (130词/分) |
建议
| 情况 | 处理方式 |
|---|---|
| 配音太长 | 先精简文字,再考虑加速 |
| 配音太短 | 稍微减速,或延长画面 |
| 语速 > +35% | 必须精简文字 |
FFmpeg 音频合并
adelay 滤镜
# 单个音频延迟 5 秒
ffmpeg -i input.mp3 -af "adelay=5000|5000" output.mp3
# 多音频按时间点合并
ffmpeg -i vo_01.mp3 -i vo_02.mp3 -i vo_03.mp3 \
-filter_complex \
"[0:a]adelay=1000|1000[a0];\
[1:a]adelay=5500|5500[a1];\
[2:a]adelay=10500|10500[a2];\
[a0][a1][a2]amix=inputs=3:duration=longest[out]" \
-map "[out]" output.mp3
音量标准化
# 标准化到 -16 LUFS(广播标准)
ffmpeg -i input.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy output.mp4
常见问题
| 问题 | 解决方案 |
|---|---|
| 配音和画面不同步 | 检查时间偏移计算 |
| 配音语速过快 | 精简文字或降低 rate |
| 配音段重叠 | 调整开始时间或缩短文字 |
| 空白间隙过大 | 添加过渡说明填补 |
| 音量不一致 | 使用 FFmpeg loudnorm |
| 网络失败 | 添加重试机制 |
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Amp
Antigravity
Claude Code
Clawdbot
Codex
Cursor
Droid
Gemini CLI
GitHub Copilot
Goose
Kilo Code
Kiro CLI
OpenCode
Roo Code
Trae
Windsurf
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.