music-skill

by @EsshUwU in DevOps & Cloud

# Install this skill:

npx skills add EsshUwU/music-skill --skill "music-skill"

Install specific skill from multi-skill repository

# Description

Generate original music and remix existing MIDI using Python, pretty_midi, and FluidSynth. Use this whenever users ask to compose, write, or generate music of any kind — piano pieces, orchestral arrangements, soundtracks, beats, melodies, lo-fi, anime OSTs, cinematic scores, synth tracks, or any other genre. Also use for remixing, re-orchestrating, or arranging existing MIDI files. Triggers on any music-composition request even if the user doesn't explicitly say "MIDI" or "compose" — phrases like "make me a song," "create a beat," "write something that sounds like X," or "orchestrate this" all count.

# SKILL.md

name: music-skill
description: Generate original music and remix existing MIDI using Python, pretty_midi, and FluidSynth. Use this whenever users ask to compose, write, or generate music of any kind — piano pieces, orchestral arrangements, soundtracks, beats, melodies, lo-fi, anime OSTs, cinematic scores, synth tracks, or any other genre. Also use for remixing, re-orchestrating, or arranging existing MIDI files. Triggers on any music-composition request even if the user doesn't explicitly say "MIDI" or "compose" — phrases like "make me a song," "create a beat," "write something that sounds like X," or "orchestrate this" all count.

Music Skill

Two workflows, in order of frequency:
1. Create-Music — generate brand-new compositions from a text prompt
2. Remix-Music — re-orchestrate or rearrange an existing MIDI file

Create-Music

1) Environment and dependency checks (always first)

Verify before writing any composition code:

python -c "import pretty_midi, numpy, scipy, mido; print('python deps ok')"
python -c "import fluidsynth; print('pyfluidsynth ok')"
fluidsynth --version
ffmpeg -version

Soundfont setup

Check for .sf2 files in soundfonts/. If the directory is missing, create it:

mkdir -p soundfonts

If no .sf2 file exists, tell the user they need one and suggest these free options:
- FluidR3_GM — best general-purpose GM soundfont (141 MB). Search "FluidR3_GM.sf2 download"
- GeneralUser GS — lighter alternative (~30 MB). Available at schristiancollins.com
- MuseScore_General — ships with MuseScore, high quality

When the user has a custom .sf2, run scripts/list_program.py to discover available instruments:

python scripts/list_program.py "soundfonts/YourFont.sf2"

This matters because custom soundfonts may map programs differently than GM standard, and some banks contain gems that won't show up in a GM reference table.

2) Output folder policy

Create a descriptive folder in the project root for each request:

cool_synth_45sec_high_and_slow_parts/
cinematic_piano_60sec_soft_intro_big_finale/
anime_ending_credits_f_major/

Put all outputs in that folder: generate.py, *.mid, *.wav, and optionally *.mp3.
Create the folder first thing after user requests, and then if you need to save any terminal output or run any scripts, save them in that folder.

3) Composition approach

Use pretty_midi to construct compositions note-by-note. This gives full control over timing, velocity, and articulation — things that make the difference between "sounds like a test file" and "sounds like music."

Reference: https://craffel.github.io/pretty-midi/

Section structure and intensity

Every piece needs an arc. Map the user's prompt into sections and assign each an intensity multiplier that shapes velocity and instrumentation density across the piece:

SECTIONS = [
    (0,  8,  "intro"),
    (8,  24, "verse"),
    (24, 40, "chorus"),
    (40, 52, "climax"),
    (52, 60, "outro"),
]

INTENSITY = {
    "intro":   0.40,
    "verse":   0.55,
    "chorus":  0.85,
    "climax":  1.00,
    "outro":   0.35,
}

Use intensity to scale velocities: vel = int(base_velocity * INTENSITY[section] * instrument_mix_level). This creates natural dynamics — the climax is loud, the intro and outro breathe.

Sections don't need to be named exactly like this — for a lo-fi piece you might have "chill_loop / variation / breakdown / chill_loop_reprise." The point is: every piece has shape, not just a flat sequence of notes.

Chord progressions

Choose progressions that match the mood. Some reliable starting points:

Mood	Progression	Example key
Bright / pop / anime	I – V – vi – IV	C: C-G-Am-F
Epic / cinematic	i – VI – III – VII	Am: Am-F-C-G
Nostalgic / bittersweet	I – vi – IV – V	F: F-Dm-Bb-C
Tense / dark	i – iv – v – i	Dm: Dm-Gm-Am-Dm
Jazz / sophisticated	ii – V – I – vi	C: Dm7-G7-Cmaj7-Am7
Hopeful resolution	IV – V – iii – vi	G: C-D-Bm-Em

Don't be afraid to borrow from multiple progressions or add passing chords. The chord progression is the harmonic skeleton — get it right and even simple melodies will sound good over it.

Melody and motif development

A good melody has a core motif (3-6 notes) that gets developed across the piece through:

Transposition — shift the whole motif up/down by an interval
Inversion — flip intervals (ascending becomes descending)
Rhythmic variation — same pitches, different durations
Fragmentation — use just the first 2-3 notes of the motif as a callback
Extension — add notes to the end of the motif to build tension

def mutate_motif(motif, shift=0, invert=False, shuffle_tail=False):
    m = [n + shift for n in motif]
    if invert:
        anchor = m[0]
        m = [anchor - (n - anchor) for n in m]
    if shuffle_tail and len(m) > 2:
        tail = m[2:]
        random.shuffle(tail)
        m = m[:2] + tail
    return m

This is what separates "a random sequence of notes" from "a composition" — the ear recognizes the motif recurring in different forms and it creates coherence.

Accompaniment patterns

The left hand / lower voices need patterns that support the melody without competing. Common approaches:

Broken chords — play chord tones one at a time in eighth notes. Works for gentle/flowing sections.
Arpeggiated — chord tones ascending or descending. More momentum than broken chords. Good for bridges and builds.
Block chords — all chord tones at once, sustained. Simple, powerful. Good for climaxes.
Alberti bass — root-fifth-third-fifth repeating pattern. Classic piano accompaniment.
Sustained pads — long held chords in string ensemble or choir. Creates a harmonic bed that fills space.

Vary the accompaniment pattern across sections — a piece that uses broken chords the whole time gets monotonous.

Humanization

Raw MIDI with perfectly quantized timing and uniform velocity sounds robotic. Add subtle randomness to make it feel performed:

def humanize(t, vel, t_jitter=0.012, v_jitter=6):
    return (
        max(0, t + random.uniform(-t_jitter, t_jitter)),
        max(1, min(127, vel + random.randint(-v_jitter, v_jitter)))
    )

Also vary note durations slightly (multiply by random.uniform(0.93, 1.02)) — real performers don't hold every note for the exact notated length.

For piano, add sustain pedal per bar or phrase — it makes a huge difference:

inst.control_changes.append(pretty_midi.ControlChange(64, 127, bar_start))
inst.control_changes.append(pretty_midi.ControlChange(64, 0, bar_end - 0.05))

Multi-instrument layering

When using multiple instruments, think about orchestration roles:

Role	Purpose	Typical GM programs
Lead melody	Carries the main theme	Violin (40), Flute (73), Trumpet (56), Piano (0)
Counter-melody	Harmonic interest, fills gaps	Horn (60), Oboe (68), Viola (41)
Harmony pad	Sustained chords, warmth	String Ensemble (48), Choir Aahs (52), Pad (88-95)
Bass	Harmonic foundation, pulse	Cello (42), Contrabass (43), Fingered Bass (33)
Rhythm / accent	Energy, punctuation	Timpani (47), Harp (46), Vibraphone (11)

Not every instrument plays in every section. Use the intensity map to decide which instruments enter when — start sparse, add layers as intensity builds, thin out for the outro. This is called "orchestration density" and it's one of the most effective tools for creating drama.

Set per-instrument mix levels to prevent muddiness:

MIX = {"violin1": 0.95, "str_ens": 0.42, "choir": 0.45, "cello": 0.85}

Lead instruments get higher mix levels, pads and fills stay quieter.

Genre instrument palettes

When the user asks for a specific genre, use the full palette that genre demands — not just one or two token instruments. The difference between "sounds like an orchestral piece" and "sounds like a MIDI demo" is often just using the right number and combination of instruments.

The GM program numbers below are verified against FluidR3_GM.sf2. If the user has a different soundfont, run scripts/list_program.py on it first — program numbers and preset names may differ. Always match instruments by checking what's actually available in the soundfont rather than assuming these numbers are universal.

Epic / Cinematic Orchestral

The full symphonic palette. Layer heavily during climaxes, thin to solo lines in quiet moments.

Section	Instruments	GM Programs
Strings	Violin 1, Violin 2, Viola, Cello, Contrabass	40, 40, 41, 42, 43
Woodwinds	Flute, Piccolo, Oboe, English Horn, Clarinet, Bassoon	73, 72, 68, 69, 71, 70
Brass	Trumpet, Trombone, French Horn, Tuba	56, 57, 60, 58
Percussion	Timpani, Snare Drum, Bass Drum, Cymbals, Triangle	47, drum ch, drum ch, drum ch, drum ch
Color	Harp, Piano, Choir Aahs	46, 0, 52

Orchestration tips: Strings carry most of the piece. Brass enters for power moments — don't use trumpet and trombone in the intro. Woodwinds add color and double melody lines an octave up. Timpani on downbeats of climax sections. Choir for the biggest emotional peaks. Harp for transitions and arpeggiated fills. GM doesn't have separate Bass Clarinet or Contrabassoon presets — use Clarinet (71) pitched an octave lower and Bassoon (70) pitched an octave lower to simulate them. Use is_drum=True for snare, bass drum, cymbals, and triangle (GM channel 10 — pitch 38=snare, 36=bass drum, 49=crash cymbal, 81=triangle).

Anime / J-Pop OST

Piano-forward with light orchestral support. Emotional, melodic, often in major keys with minor-key bridges.

Section	Instruments	GM Programs
Core	Piano (Grand), Acoustic Guitar	0, 24
Strings	Violin, Cello, String Ensemble	40, 42, 48
Woodwinds	Flute, Clarinet	73, 71
Rhythm	Glockenspiel, Music Box	9, 10
Pads	Choir Aahs, Warm Pad	52, 89

Orchestration tips: Piano does the heavy lifting — melody in the right hand, broken-chord or arpeggiated accompaniment in the left. Strings swell in during the chorus. Flute doubles the melody an octave up in emotional peaks. Glockenspiel or music box for sparkle in the intro/outro. Keep it intimate — this genre is about feeling, not force.

Lo-Fi / Chill Hip-Hop

Warm, slightly detuned, repetitive. Loop-based structure with subtle variation. Keep velocity low and humanization high.

Section	Instruments	GM Programs
Core	Electric Piano (Rhodes), Nylon Guitar	4, 24
Bass	Fingered Bass, Synth Bass	33, 38
Texture	Vibraphone, Warm Pad, Choir Aahs	11, 89, 52
Rhythm	Drum kit (channel 10)	drum ch

Orchestration tips: Electric piano is the soul of lo-fi — use it for jazzy chord voicings (7ths, 9ths, 13ths). Bass should be simple and deep, following the root. Vibraphone adds floating melodic fragments. Keep everything at moderate-to-low velocity (50-85 range). Heavier humanization than other genres (t_jitter=0.018, v_jitter=10) — the looseness is the aesthetic. Drum pattern: kick on 1 and 3, snare on 2 and 4, hi-hats swung on eighth notes (pitch 42=closed hat, 46=open hat, 38=snare, 36=kick).

Synth / Electronic / Synthwave

Layered synth pads, punchy bass, arpeggiated leads. Heavy use of the synth program family.

Section	Instruments	GM Programs
Lead	Square Lead, Sawtooth Lead, Synth Voice	80, 81, 54
Pads	Polysynth Pad, Space Pad, Sweep Pad	90, 91, 95
Bass	Synth Bass 1, Synth Bass 2	38, 39
Rhythm	Drum kit (channel 10)	drum ch
Accent	Synth Brass, Vibraphone	62, 11

Orchestration tips: Build around a thick pad foundation — layer 2-3 pad types with different octaves. Lead synth plays arpeggiated patterns (16th notes cycling through chord tones) that define the energy level. Bass should be monophonic and punchy with short note durations. Use velocity automation on pads for sidechain-like pumping. Drum pattern: four-on-the-floor kick, clap on 2 and 4, open hats on offbeats. No humanization on drums — tight quantization is intentional in electronic music.

Expression, dynamics, and advanced MIDI techniques

Beyond velocity, there is a rich set of MIDI control changes and techniques that make the difference between a flat MIDI file and something that sounds performed. Use them where they fit the genre and mood.

Pedals (piano-essential):
- CC 64 (Sustain pedal) — holds all notes after release. Engage per bar or phrase. Essential for any piano piece.
- CC 66 (Sostenuto pedal) — holds only notes already pressed when pedal goes down. Useful for sustaining a bass note while playing staccato melody above it.
- CC 67 (Soft pedal / una corda) — reduces volume and brightness. Use for quiet, intimate passages.

Expression and volume:
- CC 11 (Expression) — secondary volume control for real-time swells without changing velocity. Use sinusoidal curves for string/choir swells.
- CC 7 (Main volume) — master volume per channel. Set once at the start; use CC 11 for dynamic changes.
- CC 10 (Pan) — stereo positioning (0=hard left, 64=center, 127=hard right). Place instruments across the stereo field for realistic orchestral layout (e.g., Violin 1 at 40, Violin 2 at 90, Cello at 64).

Effects:
- CC 91 (Reverb send) — room echo amount. Values 60-80 for orchestral, 30 for dry upfront piano, 100 for distant ambient strings.
- CC 93 (Chorus send) — thickens sound with detuned copies. Good on synth pads and string ensembles for width.

Pitch and vibrato:
- Pitch bend — smooth pitch slides via instrument.pitch_bends. Range -8192 to +8191 (typically ±2 semitones). Great for guitar bends, vocal slides, expressive string portamento.
- CC 1 (Modulation / vibrato) — controls vibrato depth. Oscillate the value for natural vibrato on sustained string and wind notes.
- CC 65 + CC 5 (Portamento) — glides smoothly between consecutive notes. CC 65 toggles on/off, CC 5 sets glide speed.

Articulation (controlled by note duration):
- Legato — overlap consecutive notes by ~5% for smooth connection
- Staccato — short notes at ~40% of beat duration for detached, bouncy feel
- Tenuto — full-length notes at ~98% duration for sustained emphasis
- Accent — higher velocity on specific beats (downbeats, syncopation)
- Glissando — rapid sequential chromatic notes to simulate a slide across keys

Compositional techniques:
- Counterpoint — two or more independent melodic lines interlocking rhythmically
- Key modulation — shifting harmonic center mid-piece (e.g., transpose everything +4 semitones for a key change up a major third)
- Dynamic tempo (ritardando / accelerando) — gradually slowing down or speeding up by computing note positions with variable BPM
- Swing — unevenly spacing eighth notes (e.g., 67/33 ratio) for jazz feel
- Time signature changes — waltz (3/4), march (2/4), compound (6/8) via midi.time_signature_changes
- Scale systems — constrain melodies to specific scales (major, minor, pentatonic, blues, dorian, etc.) for tonal coherence
- Markov chains — probabilistic note-to-note transitions for generative melody that sounds intentional

Not every technique belongs in every piece — pick what serves the music. But being aware of these tools means you can reach for them when a composition needs that extra expressiveness. See explain.md in the project root for full code examples of each technique.

4) Volume management

Uneven volume — some parts blasting, others barely audible — is the most common problem in generated MIDI. It happens because of several factors stacking on top of each other. Address all of them.

Why it happens

Soundfont imbalance — different instruments in the soundfont are sampled at different base volumes. A trumpet patch might be 2x louder than a flute patch at the same velocity. You can't assume velocity 80 on violin and velocity 80 on timpani will sound equally loud.
Note stacking — when 5 instruments play simultaneously in a climax, their waveforms add up and the section becomes much louder than a solo passage. The more voices playing at once, the louder the mix gets.
Velocity range abuse — using the full 1-127 range without thinking about it means some notes are near-silent (vel 30) while others clip (vel 127). The difference between velocity 40 and 120 is massive.
Expression (CC 11) not managed — if you set expression to 60 for a swell and never reset it, everything after stays quiet.

How to fix it

Step 1: Set CC 7 (main volume) per instrument at time 0.

This is the master volume knob for each channel. Use it to pre-balance instruments before any notes play. Think of it like a mixing board — set the faders before the song starts.

CHANNEL_VOLUMES = {
    "violin1":    100,
    "violin2":     85,
    "viola":       90,
    "cello":       95,
    "contrabass":  88,
    "flute":      105,   # flute is naturally quieter in most soundfonts, boost it
    "trumpet":     78,   # trumpet is naturally loud, pull it back
    "timpani":     75,   # percussion cuts through easily
    "str_ens":     70,   # pads should sit underneath
    "choir":       72,
    "harp":        95,
    "piano":      100,
}

for name, inst in instruments.items():
    vol = CHANNEL_VOLUMES.get(name, 90)
    inst.control_changes.append(
        pretty_midi.ControlChange(number=7, value=vol, time=0.0)
    )

Step 2: Keep velocities in a controlled range per instrument role.

Don't let lead instruments go below 65 or above 110. Don't let pads go above 80. This prevents the wild swings.

VELOCITY_RANGES = {
    "lead":    (70, 110),   # melody instruments
    "bass":    (75, 105),   # needs to be consistently present
    "pad":     (45, 80),    # background, never dominant
    "accent":  (80, 120),   # percussion, short hits — can be louder
}

def clamp_velocity(vel, role="lead"):
    lo, hi = VELOCITY_RANGES[role]
    return max(lo, min(hi, int(vel)))

Step 3: Scale velocity by instrument count.

When many instruments play at once, reduce each one's velocity so the sum stays controlled. A simple rule: divide a "budget" across active instruments.

def section_velocity(base_vel, num_active_instruments):
    if num_active_instruments <= 2:
        return base_vel
    # Reduce by ~8% per extra instrument beyond 2
    reduction = 0.92 ** (num_active_instruments - 2)
    return max(40, int(base_vel * reduction))

Step 4: Always reset CC 11 (expression) to 127 after swells.

If you use expression for a crescendo or decrescendo, explicitly reset it when done. Otherwise every note after the swell plays at whatever value you left it at.

# After a string swell ends at t=8.0, reset expression
inst.control_changes.append(
    pretty_midi.ControlChange(number=11, value=127, time=8.0)
)

Step 5: Normalize audio after rendering.

Even with good MIDI-level balancing, always normalize the final WAV. This catches any remaining peaks and ensures consistent output volume.

5) Render policy

Always render final MIDI to WAV. Two options depending on quality needs:

Option A: Python API (simple, no effects)

audio = midi.fluidsynth(fs=44100, sf2_path=sf2_path)
peak = np.max(np.abs(audio))
if peak > 0:
    audio = audio / peak * 0.9
audio = (audio * 32767).astype(np.int16)
scipy.io.wavfile.write(output_wav, 44100, audio)

Option B: FluidSynth CLI (better quality — adds reverb and chorus)

cmd = [
    "fluidsynth", "-n", "-i",
    "-F", output_wav, "-T", "wav",
    "-r", "44100",
    "-g", "0.4",
    "-R", "1", "-C", "1",
    "-o", "synth.reverb.room-size=0.75",
    "-o", "synth.reverb.level=0.6",
    "-o", "synth.chorus.level=3.0",
    "-o", "synth.chorus.depth=6.0",
    soundfont_path, output_midi,
]
subprocess.run(cmd, check=True)

Use Option B for multi-instrument pieces — the reverb and chorus add crucial spatial depth that the Python API doesn't provide. Use Option A for quick previews or when FluidSynth CLI isn't available.

The -g 0.4 (gain) flag prevents clipping on dense arrangements. Adjust up for sparse solo pieces.

For MP3 conversion, use the bundled script:

python scripts/wav2mp3.py output.wav

Remix-Music

Use this when the user provides an existing MIDI and wants it re-orchestrated, rearranged, or transformed.

1) Remix workspace policy

Create a dedicated folder in the project root:

remix_faded_orchestral_keep_timing/

Copy (never move) the source MIDI into the remix folder.

Expected files:
- source.mid (copy of original)
- remix.py
- remix.mid
- remix.wav (required)
- optional remix.mp3

2) Hard remix constraints

Unless the user explicitly asks to change timing:
- Keep rhythm steps identical to original
- Keep melody note timing identical (start/end preserved)
- Keep rhythmic grid and groove alignment

These constraints exist because timing is the soul of a piece — changing it makes it a different song, not a remix. Re-orchestration (new instruments, new harmony, new dynamics) can transform a piece completely while honoring the original groove.

Validate timing preservation after every remix:

python scripts/compare_midi_timing.py source.mid remix.mid

3) Analysis-first workflow

Before writing any remix code, understand the source material:

python scripts/read_midi.py "source.mid" 30
python scripts/midi4llm.py "source.mid" 30
python scripts/midi_summary.py "source.mid"
python scripts/list_program.py "soundfonts/YourFont.sf2"

read_midi.py — detailed per-note data (pitch, velocity, timing)
midi4llm.py — compact step-based format optimized for LLM reasoning
midi_summary.py — high-level overview (tempo, instruments, note counts, ranges)
list_program.py — discover what's available in the target soundfont

Read the analysis output carefully before planning the remix. Identify:
- Where the melody lives (which track, pitch range)
- What the bass is doing (pattern, register)
- Section boundaries (look for density changes, key changes, tempo shifts)
- Which notes are timing-critical anchors vs. ornamental fills

4) Remix generation approach

Use pretty_midi to re-orchestrate while preserving timing:

Extract timing-critical melody and rhythm anchors from the source
Re-instrument: assign notes to new GM programs based on their role
Add harmony layers and counter-melodies that fit the original rhythm grid
Shape dynamics with the section intensity approach from Create-Music
Add expression (CC 11 swells, CC 91 reverb) to bring the new arrangement to life

Always render the final remix to WAV using the FluidSynth CLI method (Option B above) for best quality.

Scripts

Bundled utilities in scripts/ in skills folder — use these instead of writing one-off equivalents:

Script	Purpose	When to use
`scripts/read_midi.py <file> [max_sec]`	Per-note detail dump	Before remixing; debugging note issues
`scripts/midi4llm.py <file> [max_sec]`	Step-based compact format	Before remixing; understanding rhythm
`scripts/midi_summary.py <file>`	High-level overview	Quick check of any MIDI
`scripts/compare_midi_timing.py <a> <b>`	Timing diff validation	After every remix to verify constraints
`scripts/list_program.py <sf2> [--contains X]`	Soundfont preset browser	When choosing instruments from custom sf2
`scripts/wav2mp3.py <wav> [mp3]`	WAV to MP3 via ffmpeg	When user wants MP3 output

Additional behavior

Ad-hoc Python in bash for extra MIDI inspection is fine, use it extensively while remixing a MIDI so you understand the music timing
Keep outputs musically coherent and accurate to the prompt
When the user's prompt is vague ("make something cool"), ask about mood, tempo, duration, and instruments before composing — but have sensible defaults ready if they just want you to go for it
If the user references a specific genre or artist, lean into characteristic elements of that style (chord voicings, tempo range, typical instruments, rhythmic patterns)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.