Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate...
Sync GitHub Copilot agents for FF15-inspired OpenSpec workflow. Team includes Noctis (orchestrator + OpenSpec creator), Iris (issue management), Gladiolus (implementation), Prompto (code quality),...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Deep-dive data profiling for a specific table. Use when the user asks to profile a table, wants statistics about a dataset, asks about data quality, or needs to understand a table's structure and...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or...
Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or...
Use when clarifying fuzzy boundaries, defining quality criteria, teaching by counterexample, preventing common mistakes, setting design guardrails, disambiguating similar concepts, refining...
Translates EPUB ebook files between languages with parallel processing. Supports Japanese, English, Chinese, and other languages. Handles large files by splitting into sections, manages multiple...
Use when prompts produce inconsistent or unreliable outputs, need explicit structure and constraints, require safety guardrails or quality checks, involve multi-step reasoning that needs...
Research and add new strategic frameworks to the system (meta-skill). Use when user wants to add framework not in library; discovered new framework in their domain; asks "Can you add...
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise...
Reviews pull requests and code changes for quality, security, and best practices. Use when user asks for code review, PR review, or mentions reviewing changes.
Sync GitHub Copilot agents for FF15-inspired OpenSpec workflow (non-MCP version). Team includes Noctis (orchestrator + OpenSpec creator), Iris (issue management), Gladiolus (implementation),...