Use when adding new error messages to React, or seeing "unknown error code" warnings.
npx skills add Aradotso/trending-skills --skill "autoresearch: vitest-speed"
Install specific skill from multi-skill repository
# Description
"Enable parallel test workers in vitest config",
# SKILL.md
---
name: pi-autoresearch-loop
description: Autonomous experiment loop for pi that continuously tries optimizations, measures results, and keeps what works
triggers:
- autoresearch
- autonomous experiment loop
- optimize automatically
- run experiment loop
- continuous optimization
- benchmark and improve
- start autoresearch session
- keep what works discard what doesnt
---
# pi-autoresearch — Autonomous Experiment Loop
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection
Autonomous experiment loop extension for [pi](https://github.com/antiwork/pi). Continuously proposes changes, benchmarks them, commits wins, reverts losses, and repeats — forever. Works for any measurable target: test speed, bundle size, build time, LLM training loss, Lighthouse scores.
---
## Installation
```bash
pi install https://github.com/davebcn87/pi-autoresearch
Manual install:
cp -r extensions/pi-autoresearch ~/.pi/agent/extensions/
cp -r skills/autoresearch-create ~/.pi/agent/skills/
Then /reload in pi.
Quick Start
/skill:autoresearch-create
The agent will:
1. Ask about your goal, command, metric, and files in scope (or infer from context)
2. Create a branch
3. Write autoresearch.md and autoresearch.sh
4. Run the baseline
5. Start looping immediately — no further input needed
Core Concepts
Two-file persistence model
Every session is fully recoverable from two files:
| File | Purpose |
|---|---|
autoresearch.jsonl |
Append-only log — one JSON line per run (metric, status, commit, description) |
autoresearch.md |
Living document — objective, what's been tried, dead ends, key wins |
A fresh agent with zero memory can read these two files and continue exactly where the previous session left off.
Session files written by the skill
| File | Purpose |
|---|---|
autoresearch.md |
Session document — objective, metrics, files in scope, experiment history |
autoresearch.sh |
Benchmark script — pre-checks, runs the workload, outputs METRIC name=number lines |
autoresearch.checks.sh |
(optional) Backpressure checks — tests, types, lint. Failures block keep |
Extension Tools
init_experiment
One-time session configuration. Call once at session start.
await init_experiment({
name: "vitest-speed",
metric: "seconds",
unit: "s",
direction: "lower", // "lower" | "higher"
});
run_experiment
Runs any shell command, times wall-clock duration, captures stdout/stderr.
const result = await run_experiment({
command: "pnpm test --run",
timeout_seconds: 120, // optional, default 300
checks_timeout_seconds: 300, // optional, for checks script
});
// result: { exit_code, duration_seconds, stdout, stderr }
log_experiment
Records result, auto-commits on keep, updates the status widget and dashboard.
await log_experiment({
metric_value: 42.3,
status: "keep", // "keep" | "discard" | "crash" | "checks_failed"
description: "Enable parallel test workers in vitest config",
commit_message: "perf: parallel vitest workers → 42.3s (-18%)",
});
The Autonomous Loop
Once started, the agent runs this cycle indefinitely:
propose change → edit files → run_experiment → measure metric
↓
metric improved?
YES → log_experiment(keep) → auto-commit → update autoresearch.md
NO → log_experiment(discard) → git revert → try next idea
↓
repeat forever (until interrupted)
Interrupt anytime with Escape, then ask for a summary of what was tried.
Benchmark Script Format
autoresearch.sh must output at least one METRIC line:
#!/bin/bash
set -euo pipefail
# Pre-checks
[ -f package.json ] || { echo "No package.json"; exit 1; }
# Run workload
pnpm test --run
# Output metric — required format
echo "METRIC seconds=$SECONDS"
Multiple metrics are supported:
echo "METRIC duration_seconds=42.3"
echo "METRIC test_count=847"
echo "METRIC memory_mb=512"
The primary metric (set in init_experiment) drives keep/discard decisions. Others are recorded for analysis.
Backpressure Checks (Optional)
Create autoresearch.checks.sh to guard correctness after every passing benchmark:
#!/bin/bash
set -euo pipefail
pnpm test --run # full test suite
pnpm typecheck # TypeScript
pnpm lint # ESLint / Biome
Behavior:
- File absent → loop runs exactly as before, no change
- File present → runs automatically after every benchmark that exits 0
- Checks time does not count toward the primary metric
- Checks failure → logged as checks_failed, changes reverted (same as crash)
- Dashboard shows checks_failed separately from crash so you can distinguish correctness failures from benchmark errors
UI
Status Widget
Always visible above the editor:
🔬 autoresearch 12 runs 8 kept │ best: 42.3s
Dashboard
Open with /autoresearch — full results table with status, metric values, descriptions, and best run highlighted.
Ctrl+X— toggle dashboardEscape— close dashboard / interrupt loop
Example Domains
// Test speed
{
command: "pnpm test --run",
metric: "seconds",
direction: "lower",
scope: ["vitest.config.ts", "src/**/*.test.ts"],
}
// Bundle size
{
command: "pnpm build && du -sb dist | cut -f1",
metric: "bytes",
direction: "lower",
scope: ["vite.config.ts", "src/index.ts"],
}
// LLM training loss
{
command: "uv run train.py --epochs 1",
metric: "val_bpb",
direction: "lower",
scope: ["train.py", "model.py", "config.yaml"],
}
// Build speed
{
command: "pnpm build",
metric: "seconds",
direction: "lower",
scope: ["tsconfig.json", "vite.config.ts"],
}
// Lighthouse performance
{
command: "lighthouse http://localhost:3000 --output=json | jq '.categories.performance.score'",
metric: "score",
direction: "higher",
scope: ["src/pages/index.tsx", "public/"],
}
autoresearch.md Structure
The skill writes and maintains this file throughout the session:
# autoresearch: vitest-speed
## Objective
Reduce test suite wall-clock time. Baseline: 51.7s.
## Metric
- Name: seconds
- Direction: lower is better
- Baseline: 51.7s
- Best so far: 42.3s (run 8)
## Files in scope
- vitest.config.ts
- src/**/*.test.ts
## What's been tried
- [kept] Run 8: Enable parallel workers → 42.3s (-18%)
- [discarded] Run 5: Increase pool size to 16 → 53.1s (+3%)
- [kept] Run 3: Disable coverage in CI → 47.8s (-8%)
## Dead ends
- Increasing pool beyond 8 causes memory pressure, net negative
## Next ideas
- [ ] Try forks pool instead of threads
- [ ] Investigate slow test files with --reporter=verbose
autoresearch.jsonl Format
One JSON object per line:
{"run":1,"metric_value":51.7,"status":"keep","description":"baseline","commit":"a1b2c3d","timestamp":"2025-01-15T10:00:00Z"}
{"run":2,"metric_value":49.2,"status":"keep","description":"disable coverage","commit":"e4f5g6h","timestamp":"2025-01-15T10:03:21Z"}
{"run":3,"metric_value":53.1,"status":"discard","description":"increase pool to 16","commit":null,"timestamp":"2025-01-15T10:07:45Z"}
{"run":4,"metric_value":null,"status":"crash","description":"invalid vitest config syntax","commit":null,"timestamp":"2025-01-15T10:09:12Z"}
Read the log programmatically:
import { readFileSync } from "fs";
const runs = readFileSync("autoresearch.jsonl", "utf-8")
.trim()
.split("\n")
.map((line) => JSON.parse(line));
const kept = runs.filter((r) => r.status === "keep");
const best = kept.reduce((a, b) =>
a.metric_value < b.metric_value ? a : b
);
console.log(`Best: ${best.metric_value} — ${best.description}`);
Resuming a Session
The agent can resume from either file. Recommended resume prompt:
Read autoresearch.jsonl and autoresearch.md, then continue the experiment loop.
Don't restart — pick up from run N and keep going.
Or use the skill:
/skill:autoresearch-create resume
Architecture
┌──────────────────────┐ ┌──────────────────────────┐
│ Extension (global) │ │ Skill (per-domain) │
│ │ │ │
│ run_experiment │◄────│ command: pnpm test │
│ log_experiment │ │ metric: seconds (lower) │
│ widget + dashboard │ │ scope: vitest configs │
│ │ │ ideas: pool, parallel… │
└──────────────────────┘ └──────────────────────────┘
│
▼
autoresearch.jsonl ← append-only run log
autoresearch.md ← living session document
The extension is domain-agnostic infrastructure. The skill encodes domain knowledge. One extension serves unlimited domains.
Troubleshooting
Loop not starting after skill runs
- Check that autoresearch.sh is executable: chmod +x autoresearch.sh
- Verify the script outputs a METRIC name=number line on success
- Run bash autoresearch.sh manually to debug
Widget not showing
- Run /reload in pi to reload the extension
- Confirm the extension is in ~/.pi/agent/extensions/pi-autoresearch/
run_experiment times out
- Increase timeout_seconds in your run_experiment call
- Default is 300s — long benchmarks (LLM training) may need 3600+
Checks script blocking everything
- Check autoresearch.checks.sh exit codes manually: bash autoresearch.checks.sh
- Increase checks_timeout_seconds if tests are slow
- Remove the file temporarily to isolate whether the benchmark or checks are failing
Session lost after context reset
- The agent needs only autoresearch.jsonl + autoresearch.md to resume
- Both files are committed to the branch — they survive any context reset
- Use the resume prompt above to continue
Metric value not captured
- Ensure the benchmark script exits 0 on success
- The METRIC line must be on stdout, not stderr
- Format must be exactly METRIC name=number (no spaces around =)
License
MIT
```
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.