modularml

mojo-best-practices

0
0
# Install this skill:
npx skills add modularml/agent-skills --skill "mojo-best-practices"

Install specific skill from multi-skill repository

# Description

>

# SKILL.md


name: mojo-best-practices
description: >
Mojo programming best practices from the official modular/modular repository.
Use when writing, reviewing, or optimizing Mojo code. Covers memory safety,
ownership patterns, GPU kernels (SM90/SM100 tensor cores), BLAS integration,
testing patterns, and performance optimization. Supports both stable (v25.7)
and nightly (v0.26.1).


Mojo Best Practices

Best practices for Mojo programming. 126 rules across 12 categories.

Version Support

This skill supports both stable and nightly Mojo versions:

Version Mojo Rules Directory
Stable v25.7 rules/ + rules/stable/
Nightly v0.26.1 rules/ + rules/nightly/

Detect your version: Run mojo --version or check pixi list | grep mojo

Key differences: Most breaking changes are now in both versions. Nightly-only features:

Feature Stable (v25.7) Nightly (v0.26.1+)
Constants alias comptime (preferred, alias deprecated)
Struct alignment Not available @align(N) decorator
Typed errors Not available fn foo() raises CustomError
Never type Not available fn abort() -> Never
Compile-time expr Implicit comptime(expr) explicit
Trait methods pass only ... (no default) vs pass (empty)
Fn type conversion Explicit Non-raising β†’ raising implicit
Copyable trait Copyable, Movable Copyable refines Movable
Struct reflection Not available struct_field_count[T]()
Linear types AnyType needs __del__() ImplicitlyDestructible trait

Shared syntax (v25.7+):
- @fieldwise_init (not @value)
- var/deinit (not owned)
- Writable trait (not Stringable)

stable changelog | nightly changelog | breaking changes

Related: max-best-practices for MAX Serve deployment and inference.

Quick Decision Guide

Goal Category Key Rules
Write safe code Memory Safety memory-ownership-transfer, memory-lifecycle-methods
Maximum performance Performance perf-vectorize, perf-parallelize (17,200x vs Python)
GPU acceleration GPU Programming gpu-fundamentals, gpu-tensor-core-sm90-sm100
BLAS acceleration C Interop ffi-blas-accelerate (25-32x speedup)
Migrate from Python Python Interop python-type-conversion, python-minimize-crossing
Design APIs Struct + Function struct-trait-conformance, fn-argument-conventions

Rule Categories

Priority Category Count Prefix
CRITICAL Memory Safety & Ownership 11 memory-
CRITICAL Type System 9 type-
CRITICAL GPU Programming 17 gpu-
CRITICAL C Interoperability 10 ffi-
HIGH Struct Design 9 struct-
HIGH Function Design 7 fn-
HIGH Testing 4 test-
HIGH Debugging 2 debug-
MEDIUM-HIGH Error Handling 5 error-
MEDIUM Performance Optimization 31 perf-
MEDIUM Python Interoperability 4 python-
LOW Advanced Metaprogramming 5 meta-

Memory Safety (CRITICAL)

Rule Pattern
memory-ownership-transfer Use ^ for ownership transfer
memory-borrow-vs-copy Prefer read over copying
memory-lifecycle-methods Implement __init__, __del__, etc.
memory-safe-pointers Use OwnedPointer/ArcPointer over UnsafePointer
memory-origin-tracking Explicit origin for UnsafePointer
memory-atomic-refcounting MONOTONIC/RELEASE/ACQUIRE ordering

Type System (CRITICAL)

Rule Pattern
type-explicit-annotations Always annotate types (10-100x vs dynamic)
type-simd-vectorization Use SIMD for numerics (4-16x speedup)
type-register-passable @register_passable for small types
type-trait-bounds [T: Trait1 & Trait2] bounds

GPU Programming (CRITICAL)

Rule Pattern
gpu-fundamentals Thread hierarchy, DeviceContext
gpu-synchronization barrier(), syncwarp(), named barriers
gpu-warp-specialization Separate warps for load/compute/epilogue
gpu-tensor-core-sm90-sm100 WGMMA (SM90), UMMA (SM100) patterns
gpu-tma-loading TMA hardware for 2D tile loads
gpu-shared-memory-swizzle Swizzle patterns for bank-free access

C Interoperability (CRITICAL)

Rule Pattern
ffi-blas-accelerate Apple BLAS (25-32x matmul speedup)
ffi-apple-amx-blas Apple AMX coprocessor via BLAS (2700 GFLOP/s)
ffi-libc-functions external_call for C functions
ffi-binary-data-patterns Read/write binary files

Performance (MEDIUM)

Rule Pattern
perf-vectorize vectorize function (4-16x SIMD speedup)
perf-parallelize parallelize + SIMD (17,200x vs Python)
perf-multiple-accumulators 8 SIMD accumulators for ILP (1.5-2x speedup)
perf-early-simd-exit Exit when all SIMD lanes complete (10-50% speedup)
perf-algorithm-shortcuts Cardioid skip, trig identities (20-50% speedup)
perf-polynomial-approximation SIMD trig via Taylor/Chebyshev (5-20x speedup)
perf-precision-tradeoffs Float32 over Float64 (2x throughput)
perf-memory-prefetch Software prefetching for memory-bound ops
perf-memory-layout SoA, transpose for coalesced access
perf-raw-pointers UnsafePointer for hot paths (2.3x vs List)

Testing (HIGH)

Rule Pattern
test-suite-patterns TestSuite.discover_tests[]().run()
test-benchmark-patterns keep(), clobber_memory()

File Structure

skills/mojo-best-practices/
β”œβ”€β”€ SKILL.md               # Quick reference (this file)
β”œβ”€β”€ AGENTS.md              # Auto-generated rule index
β”œβ”€β”€ metadata.json          # Skill metadata
β”œβ”€β”€ CHANGELOG.md           # Skill version history
β”œβ”€β”€ reference/             # Detailed reference docs
β”‚   └── breaking-changes.md
└── rules/                 # Rules for both versions (122)
    β”œβ”€β”€ memory-*.md        
    β”œβ”€β”€ gpu-*.md           
    β”œβ”€β”€ perf-*.md          
    β”œβ”€β”€ stable/            # Stable-preferred syntax
    β”‚   └── meta-alias-constants.md
    └── nightly/           # Nightly-only features (v0.26.1+)
        β”œβ”€β”€ meta-comptime-values.md
        β”œβ”€β”€ meta-comptime-expression.md
        β”œβ”€β”€ type-trait-refinement.md
        └── type-linear-types.md

Local Implementation Notes

When using this skill in a project, agents should collect implementation notes locally within that project, not globally. This ensures project-specific learnings stay with the project.

Where to store notes:

your-project/
β”œβ”€β”€ IMPLEMENTATION_NOTES.md    # Project-specific learnings
β”œβ”€β”€ .cursor/
β”‚   └── rules/                 # Project-specific rules
└── ...

What to capture:
- Version-specific workarounds discovered
- Performance optimizations that worked for this codebase
- API quirks encountered
- Build configuration decisions
- Platform-specific adjustments (macOS/Linux/GPU)

Usage: Agents should check for and update IMPLEMENTATION_NOTES.md in the project root when discovering new patterns or resolving issues.

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.