Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes...
npx skills add EsshUwU/music-skill --skill "music-skill"
Install specific skill from multi-skill repository
# Description
Generate original music and remix existing MIDI using Python, pretty_midi, and FluidSynth. Use this whenever users ask to compose, write, or generate music of any kind — piano pieces, orchestral arrangements, soundtracks, beats, melodies, lo-fi, anime OSTs, cinematic scores, synth tracks, or any other genre. Also use for remixing, re-orchestrating, or arranging existing MIDI files. Triggers on any music-composition request even if the user doesn't explicitly say "MIDI" or "compose" — phrases like "make me a song," "create a beat," "write something that sounds like X," or "orchestrate this" all count.
# SKILL.md
name: music-skill
description: Generate original music and remix existing MIDI using Python, pretty_midi, and FluidSynth. Use this whenever users ask to compose, write, or generate music of any kind — piano pieces, orchestral arrangements, soundtracks, beats, melodies, lo-fi, anime OSTs, cinematic scores, synth tracks, or any other genre. Also use for remixing, re-orchestrating, or arranging existing MIDI files. Triggers on any music-composition request even if the user doesn't explicitly say "MIDI" or "compose" — phrases like "make me a song," "create a beat," "write something that sounds like X," or "orchestrate this" all count.
Music Skill
Two workflows, in order of frequency:
1. Create-Music — generate brand-new compositions from a text prompt
2. Remix-Music — re-orchestrate or rearrange an existing MIDI file
Create-Music
1) Environment and dependency checks (always first)
Verify before writing any composition code:
python -c "import pretty_midi, numpy, scipy, mido; print('python deps ok')"
python -c "import fluidsynth; print('pyfluidsynth ok')"
fluidsynth --version
ffmpeg -version
Soundfont setup
Check for .sf2 files in soundfonts/. If the directory is missing, create it:
mkdir -p soundfonts
If no .sf2 file exists, tell the user they need one and suggest these free options:
- FluidR3_GM — best general-purpose GM soundfont (141 MB). Search "FluidR3_GM.sf2 download"
- GeneralUser GS — lighter alternative (~30 MB). Available at schristiancollins.com
- MuseScore_General — ships with MuseScore, high quality
When the user has a custom .sf2, run scripts/list_program.py to discover available instruments:
python scripts/list_program.py "soundfonts/YourFont.sf2"
This matters because custom soundfonts may map programs differently than GM standard, and some banks contain gems that won't show up in a GM reference table.
2) Output folder policy
Create a descriptive folder in the project root for each request:
cool_synth_45sec_high_and_slow_parts/
cinematic_piano_60sec_soft_intro_big_finale/
anime_ending_credits_f_major/
Put all outputs in that folder: generate.py, *.mid, *.wav, and optionally *.mp3.
Create the folder first thing after user requests, and then if you need to save any terminal output or run any scripts, save them in that folder.
3) Composition approach
Use pretty_midi to construct compositions note-by-note. This gives full control over timing, velocity, and articulation — things that make the difference between "sounds like a test file" and "sounds like music."
Reference: https://craffel.github.io/pretty-midi/
Section structure and intensity
Every piece needs an arc. Map the user's prompt into sections and assign each an intensity multiplier that shapes velocity and instrumentation density across the piece:
SECTIONS = [
(0, 8, "intro"),
(8, 24, "verse"),
(24, 40, "chorus"),
(40, 52, "climax"),
(52, 60, "outro"),
]
INTENSITY = {
"intro": 0.40,
"verse": 0.55,
"chorus": 0.85,
"climax": 1.00,
"outro": 0.35,
}
Use intensity to scale velocities: vel = int(base_velocity * INTENSITY[section] * instrument_mix_level). This creates natural dynamics — the climax is loud, the intro and outro breathe.
Sections don't need to be named exactly like this — for a lo-fi piece you might have "chill_loop / variation / breakdown / chill_loop_reprise." The point is: every piece has shape, not just a flat sequence of notes.
Chord progressions
Choose progressions that match the mood. Some reliable starting points:
| Mood | Progression | Example key |
|---|---|---|
| Bright / pop / anime | I – V – vi – IV | C: C-G-Am-F |
| Epic / cinematic | i – VI – III – VII | Am: Am-F-C-G |
| Nostalgic / bittersweet | I – vi – IV – V | F: F-Dm-Bb-C |
| Tense / dark | i – iv – v – i | Dm: Dm-Gm-Am-Dm |
| Jazz / sophisticated | ii – V – I – vi | C: Dm7-G7-Cmaj7-Am7 |
| Hopeful resolution | IV – V – iii – vi | G: C-D-Bm-Em |
Don't be afraid to borrow from multiple progressions or add passing chords. The chord progression is the harmonic skeleton — get it right and even simple melodies will sound good over it.
Melody and motif development
A good melody has a core motif (3-6 notes) that gets developed across the piece through:
- Transposition — shift the whole motif up/down by an interval
- Inversion — flip intervals (ascending becomes descending)
- Rhythmic variation — same pitches, different durations
- Fragmentation — use just the first 2-3 notes of the motif as a callback
- Extension — add notes to the end of the motif to build tension
def mutate_motif(motif, shift=0, invert=False, shuffle_tail=False):
m = [n + shift for n in motif]
if invert:
anchor = m[0]
m = [anchor - (n - anchor) for n in m]
if shuffle_tail and len(m) > 2:
tail = m[2:]
random.shuffle(tail)
m = m[:2] + tail
return m
This is what separates "a random sequence of notes" from "a composition" — the ear recognizes the motif recurring in different forms and it creates coherence.
Accompaniment patterns
The left hand / lower voices need patterns that support the melody without competing. Common approaches:
- Broken chords — play chord tones one at a time in eighth notes. Works for gentle/flowing sections.
- Arpeggiated — chord tones ascending or descending. More momentum than broken chords. Good for bridges and builds.
- Block chords — all chord tones at once, sustained. Simple, powerful. Good for climaxes.
- Alberti bass — root-fifth-third-fifth repeating pattern. Classic piano accompaniment.
- Sustained pads — long held chords in string ensemble or choir. Creates a harmonic bed that fills space.
Vary the accompaniment pattern across sections — a piece that uses broken chords the whole time gets monotonous.
Humanization
Raw MIDI with perfectly quantized timing and uniform velocity sounds robotic. Add subtle randomness to make it feel performed:
def humanize(t, vel, t_jitter=0.012, v_jitter=6):
return (
max(0, t + random.uniform(-t_jitter, t_jitter)),
max(1, min(127, vel + random.randint(-v_jitter, v_jitter)))
)
Also vary note durations slightly (multiply by random.uniform(0.93, 1.02)) — real performers don't hold every note for the exact notated length.
For piano, add sustain pedal per bar or phrase — it makes a huge difference:
inst.control_changes.append(pretty_midi.ControlChange(64, 127, bar_start))
inst.control_changes.append(pretty_midi.ControlChange(64, 0, bar_end - 0.05))
Multi-instrument layering
When using multiple instruments, think about orchestration roles:
| Role | Purpose | Typical GM programs |
|---|---|---|
| Lead melody | Carries the main theme | Violin (40), Flute (73), Trumpet (56), Piano (0) |
| Counter-melody | Harmonic interest, fills gaps | Horn (60), Oboe (68), Viola (41) |
| Harmony pad | Sustained chords, warmth | String Ensemble (48), Choir Aahs (52), Pad (88-95) |
| Bass | Harmonic foundation, pulse | Cello (42), Contrabass (43), Fingered Bass (33) |
| Rhythm / accent | Energy, punctuation | Timpani (47), Harp (46), Vibraphone (11) |
Not every instrument plays in every section. Use the intensity map to decide which instruments enter when — start sparse, add layers as intensity builds, thin out for the outro. This is called "orchestration density" and it's one of the most effective tools for creating drama.
Set per-instrument mix levels to prevent muddiness:
MIX = {"violin1": 0.95, "str_ens": 0.42, "choir": 0.45, "cello": 0.85}
Lead instruments get higher mix levels, pads and fills stay quieter.
Genre instrument palettes
When the user asks for a specific genre, use the full palette that genre demands — not just one or two token instruments. The difference between "sounds like an orchestral piece" and "sounds like a MIDI demo" is often just using the right number and combination of instruments.
The GM program numbers below are verified against FluidR3_GM.sf2. If the user has a different soundfont, run scripts/list_program.py on it first — program numbers and preset names may differ. Always match instruments by checking what's actually available in the soundfont rather than assuming these numbers are universal.
Epic / Cinematic Orchestral
The full symphonic palette. Layer heavily during climaxes, thin to solo lines in quiet moments.
| Section | Instruments | GM Programs |
|---|---|---|
| Strings | Violin 1, Violin 2, Viola, Cello, Contrabass | 40, 40, 41, 42, 43 |
| Woodwinds | Flute, Piccolo, Oboe, English Horn, Clarinet, Bassoon | 73, 72, 68, 69, 71, 70 |
| Brass | Trumpet, Trombone, French Horn, Tuba | 56, 57, 60, 58 |
| Percussion | Timpani, Snare Drum, Bass Drum, Cymbals, Triangle | 47, drum ch, drum ch, drum ch, drum ch |
| Color | Harp, Piano, Choir Aahs | 46, 0, 52 |
Orchestration tips: Strings carry most of the piece. Brass enters for power moments — don't use trumpet and trombone in the intro. Woodwinds add color and double melody lines an octave up. Timpani on downbeats of climax sections. Choir for the biggest emotional peaks. Harp for transitions and arpeggiated fills. GM doesn't have separate Bass Clarinet or Contrabassoon presets — use Clarinet (71) pitched an octave lower and Bassoon (70) pitched an octave lower to simulate them. Use is_drum=True for snare, bass drum, cymbals, and triangle (GM channel 10 — pitch 38=snare, 36=bass drum, 49=crash cymbal, 81=triangle).
Anime / J-Pop OST
Piano-forward with light orchestral support. Emotional, melodic, often in major keys with minor-key bridges.
| Section | Instruments | GM Programs |
|---|---|---|
| Core | Piano (Grand), Acoustic Guitar | 0, 24 |
| Strings | Violin, Cello, String Ensemble | 40, 42, 48 |
| Woodwinds | Flute, Clarinet | 73, 71 |
| Rhythm | Glockenspiel, Music Box | 9, 10 |
| Pads | Choir Aahs, Warm Pad | 52, 89 |
Orchestration tips: Piano does the heavy lifting — melody in the right hand, broken-chord or arpeggiated accompaniment in the left. Strings swell in during the chorus. Flute doubles the melody an octave up in emotional peaks. Glockenspiel or music box for sparkle in the intro/outro. Keep it intimate — this genre is about feeling, not force.
Lo-Fi / Chill Hip-Hop
Warm, slightly detuned, repetitive. Loop-based structure with subtle variation. Keep velocity low and humanization high.
| Section | Instruments | GM Programs |
|---|---|---|
| Core | Electric Piano (Rhodes), Nylon Guitar | 4, 24 |
| Bass | Fingered Bass, Synth Bass | 33, 38 |
| Texture | Vibraphone, Warm Pad, Choir Aahs | 11, 89, 52 |
| Rhythm | Drum kit (channel 10) | drum ch |
Orchestration tips: Electric piano is the soul of lo-fi — use it for jazzy chord voicings (7ths, 9ths, 13ths). Bass should be simple and deep, following the root. Vibraphone adds floating melodic fragments. Keep everything at moderate-to-low velocity (50-85 range). Heavier humanization than other genres (t_jitter=0.018, v_jitter=10) — the looseness is the aesthetic. Drum pattern: kick on 1 and 3, snare on 2 and 4, hi-hats swung on eighth notes (pitch 42=closed hat, 46=open hat, 38=snare, 36=kick).
Synth / Electronic / Synthwave
Layered synth pads, punchy bass, arpeggiated leads. Heavy use of the synth program family.
| Section | Instruments | GM Programs |
|---|---|---|
| Lead | Square Lead, Sawtooth Lead, Synth Voice | 80, 81, 54 |
| Pads | Polysynth Pad, Space Pad, Sweep Pad | 90, 91, 95 |
| Bass | Synth Bass 1, Synth Bass 2 | 38, 39 |
| Rhythm | Drum kit (channel 10) | drum ch |
| Accent | Synth Brass, Vibraphone | 62, 11 |
Orchestration tips: Build around a thick pad foundation — layer 2-3 pad types with different octaves. Lead synth plays arpeggiated patterns (16th notes cycling through chord tones) that define the energy level. Bass should be monophonic and punchy with short note durations. Use velocity automation on pads for sidechain-like pumping. Drum pattern: four-on-the-floor kick, clap on 2 and 4, open hats on offbeats. No humanization on drums — tight quantization is intentional in electronic music.
Expression, dynamics, and advanced MIDI techniques
Beyond velocity, there is a rich set of MIDI control changes and techniques that make the difference between a flat MIDI file and something that sounds performed. Use them where they fit the genre and mood.
Pedals (piano-essential):
- CC 64 (Sustain pedal) — holds all notes after release. Engage per bar or phrase. Essential for any piano piece.
- CC 66 (Sostenuto pedal) — holds only notes already pressed when pedal goes down. Useful for sustaining a bass note while playing staccato melody above it.
- CC 67 (Soft pedal / una corda) — reduces volume and brightness. Use for quiet, intimate passages.
Expression and volume:
- CC 11 (Expression) — secondary volume control for real-time swells without changing velocity. Use sinusoidal curves for string/choir swells.
- CC 7 (Main volume) — master volume per channel. Set once at the start; use CC 11 for dynamic changes.
- CC 10 (Pan) — stereo positioning (0=hard left, 64=center, 127=hard right). Place instruments across the stereo field for realistic orchestral layout (e.g., Violin 1 at 40, Violin 2 at 90, Cello at 64).
Effects:
- CC 91 (Reverb send) — room echo amount. Values 60-80 for orchestral, 30 for dry upfront piano, 100 for distant ambient strings.
- CC 93 (Chorus send) — thickens sound with detuned copies. Good on synth pads and string ensembles for width.
Pitch and vibrato:
- Pitch bend — smooth pitch slides via instrument.pitch_bends. Range -8192 to +8191 (typically ±2 semitones). Great for guitar bends, vocal slides, expressive string portamento.
- CC 1 (Modulation / vibrato) — controls vibrato depth. Oscillate the value for natural vibrato on sustained string and wind notes.
- CC 65 + CC 5 (Portamento) — glides smoothly between consecutive notes. CC 65 toggles on/off, CC 5 sets glide speed.
Articulation (controlled by note duration):
- Legato — overlap consecutive notes by ~5% for smooth connection
- Staccato — short notes at ~40% of beat duration for detached, bouncy feel
- Tenuto — full-length notes at ~98% duration for sustained emphasis
- Accent — higher velocity on specific beats (downbeats, syncopation)
- Glissando — rapid sequential chromatic notes to simulate a slide across keys
Compositional techniques:
- Counterpoint — two or more independent melodic lines interlocking rhythmically
- Key modulation — shifting harmonic center mid-piece (e.g., transpose everything +4 semitones for a key change up a major third)
- Dynamic tempo (ritardando / accelerando) — gradually slowing down or speeding up by computing note positions with variable BPM
- Swing — unevenly spacing eighth notes (e.g., 67/33 ratio) for jazz feel
- Time signature changes — waltz (3/4), march (2/4), compound (6/8) via midi.time_signature_changes
- Scale systems — constrain melodies to specific scales (major, minor, pentatonic, blues, dorian, etc.) for tonal coherence
- Markov chains — probabilistic note-to-note transitions for generative melody that sounds intentional
Not every technique belongs in every piece — pick what serves the music. But being aware of these tools means you can reach for them when a composition needs that extra expressiveness. See explain.md in the project root for full code examples of each technique.
4) Volume management
Uneven volume — some parts blasting, others barely audible — is the most common problem in generated MIDI. It happens because of several factors stacking on top of each other. Address all of them.
Why it happens
- Soundfont imbalance — different instruments in the soundfont are sampled at different base volumes. A trumpet patch might be 2x louder than a flute patch at the same velocity. You can't assume velocity 80 on violin and velocity 80 on timpani will sound equally loud.
- Note stacking — when 5 instruments play simultaneously in a climax, their waveforms add up and the section becomes much louder than a solo passage. The more voices playing at once, the louder the mix gets.
- Velocity range abuse — using the full 1-127 range without thinking about it means some notes are near-silent (vel 30) while others clip (vel 127). The difference between velocity 40 and 120 is massive.
- Expression (CC 11) not managed — if you set expression to 60 for a swell and never reset it, everything after stays quiet.
How to fix it
Step 1: Set CC 7 (main volume) per instrument at time 0.
This is the master volume knob for each channel. Use it to pre-balance instruments before any notes play. Think of it like a mixing board — set the faders before the song starts.
CHANNEL_VOLUMES = {
"violin1": 100,
"violin2": 85,
"viola": 90,
"cello": 95,
"contrabass": 88,
"flute": 105, # flute is naturally quieter in most soundfonts, boost it
"trumpet": 78, # trumpet is naturally loud, pull it back
"timpani": 75, # percussion cuts through easily
"str_ens": 70, # pads should sit underneath
"choir": 72,
"harp": 95,
"piano": 100,
}
for name, inst in instruments.items():
vol = CHANNEL_VOLUMES.get(name, 90)
inst.control_changes.append(
pretty_midi.ControlChange(number=7, value=vol, time=0.0)
)
Step 2: Keep velocities in a controlled range per instrument role.
Don't let lead instruments go below 65 or above 110. Don't let pads go above 80. This prevents the wild swings.
VELOCITY_RANGES = {
"lead": (70, 110), # melody instruments
"bass": (75, 105), # needs to be consistently present
"pad": (45, 80), # background, never dominant
"accent": (80, 120), # percussion, short hits — can be louder
}
def clamp_velocity(vel, role="lead"):
lo, hi = VELOCITY_RANGES[role]
return max(lo, min(hi, int(vel)))
Step 3: Scale velocity by instrument count.
When many instruments play at once, reduce each one's velocity so the sum stays controlled. A simple rule: divide a "budget" across active instruments.
def section_velocity(base_vel, num_active_instruments):
if num_active_instruments <= 2:
return base_vel
# Reduce by ~8% per extra instrument beyond 2
reduction = 0.92 ** (num_active_instruments - 2)
return max(40, int(base_vel * reduction))
Step 4: Always reset CC 11 (expression) to 127 after swells.
If you use expression for a crescendo or decrescendo, explicitly reset it when done. Otherwise every note after the swell plays at whatever value you left it at.
# After a string swell ends at t=8.0, reset expression
inst.control_changes.append(
pretty_midi.ControlChange(number=11, value=127, time=8.0)
)
Step 5: Normalize audio after rendering.
Even with good MIDI-level balancing, always normalize the final WAV. This catches any remaining peaks and ensures consistent output volume.
5) Render policy
Always render final MIDI to WAV. Two options depending on quality needs:
Option A: Python API (simple, no effects)
audio = midi.fluidsynth(fs=44100, sf2_path=sf2_path)
peak = np.max(np.abs(audio))
if peak > 0:
audio = audio / peak * 0.9
audio = (audio * 32767).astype(np.int16)
scipy.io.wavfile.write(output_wav, 44100, audio)
Option B: FluidSynth CLI (better quality — adds reverb and chorus)
cmd = [
"fluidsynth", "-n", "-i",
"-F", output_wav, "-T", "wav",
"-r", "44100",
"-g", "0.4",
"-R", "1", "-C", "1",
"-o", "synth.reverb.room-size=0.75",
"-o", "synth.reverb.level=0.6",
"-o", "synth.chorus.level=3.0",
"-o", "synth.chorus.depth=6.0",
soundfont_path, output_midi,
]
subprocess.run(cmd, check=True)
Use Option B for multi-instrument pieces — the reverb and chorus add crucial spatial depth that the Python API doesn't provide. Use Option A for quick previews or when FluidSynth CLI isn't available.
The -g 0.4 (gain) flag prevents clipping on dense arrangements. Adjust up for sparse solo pieces.
For MP3 conversion, use the bundled script:
python scripts/wav2mp3.py output.wav
Remix-Music
Use this when the user provides an existing MIDI and wants it re-orchestrated, rearranged, or transformed.
1) Remix workspace policy
Create a dedicated folder in the project root:
remix_faded_orchestral_keep_timing/
Copy (never move) the source MIDI into the remix folder.
Expected files:
- source.mid (copy of original)
- remix.py
- remix.mid
- remix.wav (required)
- optional remix.mp3
2) Hard remix constraints
Unless the user explicitly asks to change timing:
- Keep rhythm steps identical to original
- Keep melody note timing identical (start/end preserved)
- Keep rhythmic grid and groove alignment
These constraints exist because timing is the soul of a piece — changing it makes it a different song, not a remix. Re-orchestration (new instruments, new harmony, new dynamics) can transform a piece completely while honoring the original groove.
Validate timing preservation after every remix:
python scripts/compare_midi_timing.py source.mid remix.mid
3) Analysis-first workflow
Before writing any remix code, understand the source material:
python scripts/read_midi.py "source.mid" 30
python scripts/midi4llm.py "source.mid" 30
python scripts/midi_summary.py "source.mid"
python scripts/list_program.py "soundfonts/YourFont.sf2"
read_midi.py— detailed per-note data (pitch, velocity, timing)midi4llm.py— compact step-based format optimized for LLM reasoningmidi_summary.py— high-level overview (tempo, instruments, note counts, ranges)list_program.py— discover what's available in the target soundfont
Read the analysis output carefully before planning the remix. Identify:
- Where the melody lives (which track, pitch range)
- What the bass is doing (pattern, register)
- Section boundaries (look for density changes, key changes, tempo shifts)
- Which notes are timing-critical anchors vs. ornamental fills
4) Remix generation approach
Use pretty_midi to re-orchestrate while preserving timing:
- Extract timing-critical melody and rhythm anchors from the source
- Re-instrument: assign notes to new GM programs based on their role
- Add harmony layers and counter-melodies that fit the original rhythm grid
- Shape dynamics with the section intensity approach from Create-Music
- Add expression (CC 11 swells, CC 91 reverb) to bring the new arrangement to life
Always render the final remix to WAV using the FluidSynth CLI method (Option B above) for best quality.
Scripts
Bundled utilities in scripts/ in skills folder — use these instead of writing one-off equivalents:
| Script | Purpose | When to use |
|---|---|---|
scripts/read_midi.py <file> [max_sec] |
Per-note detail dump | Before remixing; debugging note issues |
scripts/midi4llm.py <file> [max_sec] |
Step-based compact format | Before remixing; understanding rhythm |
scripts/midi_summary.py <file> |
High-level overview | Quick check of any MIDI |
scripts/compare_midi_timing.py <a> <b> |
Timing diff validation | After every remix to verify constraints |
scripts/list_program.py <sf2> [--contains X] |
Soundfont preset browser | When choosing instruments from custom sf2 |
scripts/wav2mp3.py <wav> [mp3] |
WAV to MP3 via ffmpeg | When user wants MP3 output |
Additional behavior
- Ad-hoc Python in bash for extra MIDI inspection is fine, use it extensively while remixing a MIDI so you understand the music timing
- Keep outputs musically coherent and accurate to the prompt
- When the user's prompt is vague ("make something cool"), ask about mood, tempo, duration, and instruments before composing — but have sensible defaults ready if they just want you to go for it
- If the user references a specific genre or artist, lean into characteristic elements of that style (chord voicings, tempo range, typical instruments, rhythmic patterns)
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.