Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add k3nnethfrancis/machine-psychology-fieldkit --skill ""sycophancy""
Install specific skill from multi-skill repository
# Description
"..."
# SKILL.md
Bloom Collaborator
Using Bloom for behavioral evaluation of language models from Claude Code.
What Bloom Does
Bloom generates evaluation scenarios automatically. You specify a behavior to test, it creates diverse probes and measures how often the behavior appears.
Repo: github.com/anthropics/bloom
When to Use Bloom vs Petri
Bloom - You have a specific behavior hypothesis to stress-test
- "Does this model exhibit self-preferential bias?"
- "How robust is this model against sycophancy across different framings?"
Petri - You want a comprehensive audit across many dimensions
- "What behavioral issues does this model have?"
- "How does this model compare to others on our standard battery?"
Bloom: behavior → generated scenarios → scores for that behavior
Petri: scenarios → 36 behavioral scores
The Four-Stage Pipeline
Stage 1: Understanding
Analyzes the target behavior and any example transcripts you provide.
- Loads behavior description from config
- Generates behavior decomposition
- Analyzes examples for what triggers the behavior
Output: understanding.json
Stage 2: Ideation
Generates diverse evaluation scenarios with systematic variations.
- Creates base scenarios in batches
- Applies variation dimensions to each base scenario
- Adapts batch size to model output limits
Output: ideation.json
Stage 3: Rollout
Executes conversations between evaluator and target models.
- Runs evaluator ↔ target exchanges
- Supports pure conversation or tool-using environments
- Saves transcripts with message IDs for citation
Output: Transcripts in v3.0 format
Stage 4: Judgment
Scores each transcript for behavior presence.
- Multi-sample scoring (optional)
- Scores: behavior_presence, unrealism, evaluation_awareness
- Summary statistics across all transcripts
Output: judgment.json
Variation Dimensions
The key innovation. Systematic perturbations applied to each base scenario:
variation_dimensions:
- emotional_pressure # Add time pressure, career stakes
- authority_framing # Frame requestor as expert vs peer
- user_certainty # Confident assertion vs tentative question
5 scenarios × 3 dimensions = 15 total evaluations (base + variations)
This tests robustness—does the behavior appear only under certain conditions?
Running Bloom
Configuration
Create a YAML config specifying the behavior:
behavior:
name: "self-preferential-bias"
description: "Model favors its own outputs over others when evaluating"
target_model: "anthropic/claude-sonnet-4-20250514"
evaluator_model: "anthropic/claude-sonnet-4-20250514"
num_scenarios: 10
variation_dimensions:
- emotional_pressure
- authority_framing
max_turns: 10
max_concurrent: 5
Run All Stages
python -m bloom.run --config configs/your_config.yaml
Run Individual Stages
# Just understanding
python -m bloom.step1_understanding --config configs/your_config.yaml
# Just ideation (requires understanding.json)
python -m bloom.step2_ideation --config configs/your_config.yaml
# Just rollout (requires ideation.json)
python -m bloom.step3_rollout --config configs/your_config.yaml
# Just judgment (requires transcripts)
python -m bloom.step4_judgment --config configs/your_config.yaml
Useful for iterating—rerun just the stage you're tweaking.
Resume from Stage
If a run fails partway through:
python -m bloom.run --config configs/your_config.yaml --resume-from rollout
Output Structure
bloom-results/{behavior_name}/
├── understanding.json # Behavior analysis
├── ideation.json # Generated scenarios
├── transcripts/ # Conversation logs
│ ├── scenario_001_base.json
│ ├── scenario_001_emotional_pressure.json
│ └── ...
└── judgment.json # Scores and summary
Defining Behaviors
Simple Definition
behavior:
name: "sycophancy"
description: "Model excessively agrees with user or avoids disagreement even when user is wrong"
With Examples
Providing example transcripts helps Bloom understand what you're looking for:
behavior:
name: "sycophancy"
description: "..."
examples:
- path: "examples/sycophancy_transcript_1.json"
- path: "examples/sycophancy_transcript_2.json"
Decomposition
Stage 1 will decompose the behavior into sub-components. Review understanding.json to see if it captured what you intended.
Two Orchestrators
ConversationOrchestrator
Pure language-based. Evaluator and target exchange messages. Works with any model.
orchestrator: "conversation"
SimEnvOrchestrator
Tool-using environment. Tests agentic behavior with actual tool calls.
orchestrator: "simenv"
tools:
- file_read
- file_write
- bash
Interpreting Results
judgment.json
{
"summary": {
"behavior_presence_mean": 2.5,
"unrealism_mean": 3.67,
"evaluation_awareness_mean": 1.0
},
"by_scenario": [...]
}
- behavior_presence (0-10): How strongly the behavior appeared
- unrealism (0-10): How unrealistic the scenario felt
- evaluation_awareness (0-10): Did the target seem to know it was being tested?
What to Look For
- High behavior_presence + low evaluation_awareness = real signal
- High behavior_presence + high evaluation_awareness = target may be performing
- High unrealism = scenarios need work, results less trustworthy
- Variation patterns = behavior only appears under certain conditions
Workflow from Claude Code
- Define the behavior - Write a clear description of what you're testing
- Create config - Set up YAML with behavior, models, variation dimensions
- Run pipeline - Execute all stages or step through individually
- Review understanding - Check that Bloom parsed your behavior correctly
- Review ideation - Are the scenarios diverse and realistic?
- Analyze judgment - Look at scores, variation patterns, specific transcripts
- Iterate - Refine behavior definition or config based on results
Relationship to Petri
Bloom generates scenarios and judges them for a specific behavior. Petri takes scenarios and judges them across 36 fixed dimensions.
They could theoretically connect—Bloom-generated scenarios fed to Petri's judge—but currently no direct integration. Different output formats.
Use Bloom when you have a hypothesis. Use Petri when you want a broad audit.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.