Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add yha9806/claude-skills-vulca --skill "culture-batch-generator"
Install specific skill from multi-skill repository
# Description
多文化批量评论生成Skill v2.0。支持Chinese/Korean/Indian/Mural/Western/Islamic/Japanese/Hermitage文化的批量评论生成。基于增量任务清单,自动去重检查,确保评论唯一性。
# SKILL.md
name: culture-batch-generator
description: 多文化批量评论生成Skill v2.0。支持Chinese/Korean/Indian/Mural/Western/Islamic/Japanese/Hermitage文化的批量评论生成。基于增量任务清单,自动去重检查,确保评论唯一性。
Multi-Culture Batch Generator v2.0
v2.0 更新 (2025-12-27)
- ✅ 基于增量任务清单 (pending_tasks.json)
- ✅ 处理前检查图像是否已有评论
- ✅ 生成后检查评论是否与已有重复
- ✅ 支持8种文化
- ✅ 单张失败不影响整个批次
核心功能
- 支持8种文化: chinese, korean, indian, mural, western, islamic, japanese, hermitage
- 基于任务清单驱动,避免重复处理
- 自动选择对应的文化Agent
- 生成后验证评论唯一性
- 自动保存检查点,支持断点续传
文化配置 (v2.1 - 统一10张/批次)
| Culture | Agent | 批次大小 | 中文要求 | 维度要求 |
|---|---|---|---|---|
| chinese | chinese-painting-critique-agent | 10 | ≥300字 | ≥21 CN_ |
| korean | korean-painting-critique-agent | 10 | ≥200字 | ≥18 KR_ |
| indian | indian-bilingual-agent-v3 | 10 | ≥400字 | ≥21 IN_ |
| mural | mural-bilingual-agent | 10 | ≥200字 | ≥21 MU_ |
| western | western-image-critique-agent | 10 | ≥200字 | ≥18 WE_ |
| islamic | islamic-bilingual-agent | 10 | ≥200字 | ≥20 IS_ |
| japanese | japanese-image-critique-agent | 10 | ≥200字 | ≥19 JP_ |
| hermitage | hermitage-dimension-agent | 10 | ≥200字 | ≥21 WS_ |
硬编码约束 (v2.1):
- 每批次: 固定10张图片
- 最大并行: 4个agents
- 返回协议: 只返回摘要,完整数据写checkpoint
快速启动
Step 0: 生成/更新任务清单 (必须)
cd /mnt/i/VULCA\ 2.0/VULCA2.0_Project
source .venv/bin/activate
python scripts/database/incremental_scanner.py
Step 1: 查看待处理任务
import json
with open('experiments/checkpoints/pending_tasks.json') as f:
tasks = json.load(f)
print("=== 任务摘要 ===")
for culture, stats in tasks['summary'].items():
print(f"{culture}: {stats['total_pending']}条待处理")
Step 2: 提取批次任务
def get_batch_tasks(culture: str, batch_size: int = 10) -> list:
"""从任务清单提取指定文化的批次任务"""
with open('experiments/checkpoints/pending_tasks.json') as f:
tasks = json.load(f)
# 筛选指定文化的pending任务
culture_tasks = [
t for t in tasks['tasks']
if t['culture'] == culture and t['status'] == 'pending'
]
return culture_tasks[:batch_size]
# 示例: 获取10个western任务
batch = get_batch_tasks('western', 10)
print(f"获取 {len(batch)} 个任务")
批次处理流程 (v2.0)
Step 1: 预检查 - 确认图像未处理
import lancedb
import os
def pre_check(image_paths: list) -> dict:
"""检查图像是否已在数据库中"""
db = lancedb.connect('/home/yhryzy/vulca_lancedb')
pairs = db.open_table('matched_pairs')
existing = pairs.search().limit(20000).to_list()
db_images = set(os.path.basename(r.get('optimized_path') or r.get('image_path', ''))
for r in existing)
results = {'new': [], 'existing': []}
for path in image_paths:
basename = os.path.basename(path)
if basename in db_images:
results['existing'].append(path)
else:
results['new'].append(path)
return results
# 使用
check = pre_check([t['image_path'] for t in batch])
print(f"新图像: {len(check['new'])}, 已存在: {len(check['existing'])}")
# 只处理新图像
batch_to_process = [t for t in batch if t['image_path'] in check['new']]
Step 2: 调用对应Agent
# Agent映射
CULTURE_AGENTS = {
'chinese': 'chinese-painting-critique-agent',
'korean': 'korean-painting-critique-agent',
'indian': 'indian-bilingual-agent-v3',
'mural': 'mural-bilingual-agent',
'western': 'western-image-critique-agent',
'islamic': 'islamic-bilingual-agent',
'japanese': 'japanese-image-critique-agent',
'hermitage': 'hermitage-dimension-agent'
}
# 保存批次输入
import json
input_file = f'experiments/checkpoints/{culture}_batch_{batch_id:03d}_input.json'
with open(input_file, 'w') as f:
json.dump(batch_to_process, f, indent=2, ensure_ascii=False)
# 调用Task工具
# Task(
# subagent_type=CULTURE_AGENTS[culture],
# prompt=f"处理 {input_file} 中的图像,生成双语评论。"
# )
Step 3: 后验证 - 检查评论唯一性
def post_validate(new_critiques: list) -> dict:
"""验证新生成的评论是否与已有重复"""
import lancedb
db = lancedb.connect('/home/yhryzy/vulca_lancedb')
pairs = db.open_table('matched_pairs')
existing = pairs.search().limit(20000).to_list()
# 构建已有评论指纹库
existing_zh_fingerprints = set()
for r in existing:
zh = r.get('critique_zh', '')
if zh and len(zh) > 100:
existing_zh_fingerprints.add(zh[:200])
results = {'unique': [], 'duplicate': []}
for c in new_critiques:
zh = c.get('critique_zh', '')
fingerprint = zh[:200] if zh else ''
if fingerprint in existing_zh_fingerprints:
results['duplicate'].append({
'image_path': c.get('filepath') or c.get('image_path'),
'reason': 'critique_zh已存在'
})
else:
results['unique'].append(c)
# 添加到指纹库,防止批次内重复
existing_zh_fingerprints.add(fingerprint)
return results
# 使用
with open(output_file) as f:
critiques = json.load(f)
validation = post_validate(critiques)
print(f"唯一: {len(validation['unique'])}, 重复: {len(validation['duplicate'])}")
# 只保存唯一的评论
if validation['duplicate']:
print("警告: 以下图像的评论与已有重复,需重新生成:")
for d in validation['duplicate']:
print(f" - {d['image_path']}")
Step 4: 更新任务状态
def update_task_status(task_ids: list, new_status: str):
"""更新任务清单中的任务状态"""
with open('experiments/checkpoints/pending_tasks.json') as f:
tasks = json.load(f)
for task in tasks['tasks']:
if task['task_id'] in task_ids:
task['status'] = new_status
with open('experiments/checkpoints/pending_tasks.json', 'w') as f:
json.dump(tasks, f, indent=2, ensure_ascii=False)
# 标记已完成
completed_ids = [t['task_id'] for t in batch_to_process]
update_task_status(completed_ids, 'completed')
完整批次脚本模板
#!/usr/bin/env python3
"""
增强版批次处理脚本模板
"""
import json
import lancedb
import os
from datetime import datetime
# 配置
CULTURE = 'western' # 修改为目标文化
BATCH_SIZE = 10
BATCH_ID = 1
# Step 1: 获取任务
with open('experiments/checkpoints/pending_tasks.json') as f:
all_tasks = json.load(f)
culture_tasks = [
t for t in all_tasks['tasks']
if t['culture'] == CULTURE and t['status'] == 'pending'
][:BATCH_SIZE]
print(f"获取 {len(culture_tasks)} 个 {CULTURE} 任务")
# Step 2: 预检查
db = lancedb.connect('/home/yhryzy/vulca_lancedb')
pairs = db.open_table('matched_pairs')
existing = pairs.search().limit(20000).to_list()
db_images = set(os.path.basename(r.get('optimized_path') or r.get('image_path', ''))
for r in existing)
tasks_to_process = [
t for t in culture_tasks
if os.path.basename(t['image_path']) not in db_images
]
print(f"预检查通过: {len(tasks_to_process)} 个")
# Step 3: 保存批次输入
input_file = f'experiments/checkpoints/{CULTURE}_batch_{BATCH_ID:03d}_input.json'
with open(input_file, 'w') as f:
json.dump(tasks_to_process, f, indent=2, ensure_ascii=False)
print(f"批次输入已保存: {input_file}")
print(f"请使用对应Agent处理此文件")
合并到LanceDB
使用数据治理网关合并 (已包含去重检查):
from scripts.database.data_ingestion import insert
# 读取批次输出
with open(f'experiments/checkpoints/{culture}_batch_{batch_id:03d}_output.json') as f:
records = json.load(f)
# 通过治理网关合并 (自动去重)
result = insert(records, source=f'{culture}_expansion_batch_{batch_id:03d}')
print(f"Inserted: {result.inserted}, Rejected: {result.rejected_quality}, Duplicates: {result.rejected_duplicate}")
检查点机制
任务清单: experiments/checkpoints/pending_tasks.json
{
"metadata": {
"generated_at": "2025-12-27T15:31:44",
"db_image_count": 6466,
"local_image_count": 8502,
"regen_image_count": 927
},
"summary": {
"western": {"unprocessed": 584, "regen": 339, "total_pending": 629},
...
},
"tasks": [
{
"task_id": "TASK_ABC12345",
"culture": "western",
"image_file": "artist-title_hash.jpg",
"image_path": "/mnt/i/VULCA 2.0/.../image.jpg",
"task_type": "new", // or "regen"
"status": "pending", // pending -> processing -> completed/failed
"created_at": "2025-12-27T15:31:44"
}
]
}
执行优先级
按工作量从小到大:
1. Hermitage (16条): 最少,快速完成
2. Japanese (93条): 需要日文专业知识
3. Indian (126条): 中等
4. Islamic (178条): 中等
5. Mural (444条): 较大
6. Western (629条): 较大
7. Chinese (1338条): 最大
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.