Use when you have a written implementation plan to execute in a separate session with review checkpoints
npx skills add modularml/agent-skills --skill "mojo-best-practices"
Install specific skill from multi-skill repository
# Description
>
# SKILL.md
name: mojo-best-practices
description: >
Mojo programming best practices from the official modular/modular repository.
Use when writing, reviewing, or optimizing Mojo code. Covers memory safety,
ownership patterns, GPU kernels (SM90/SM100 tensor cores), BLAS integration,
testing patterns, and performance optimization. Supports both stable (v25.7)
and nightly (v0.26.1).
Mojo Best Practices
Best practices for Mojo programming. 126 rules across 12 categories.
Version Support
This skill supports both stable and nightly Mojo versions:
| Version | Mojo | Rules Directory |
|---|---|---|
| Stable | v25.7 | rules/ + rules/stable/ |
| Nightly | v0.26.1 | rules/ + rules/nightly/ |
Detect your version: Run mojo --version or check pixi list | grep mojo
Key differences: Most breaking changes are now in both versions. Nightly-only features:
| Feature | Stable (v25.7) | Nightly (v0.26.1+) |
|---|---|---|
| Constants | alias |
comptime (preferred, alias deprecated) |
| Struct alignment | Not available | @align(N) decorator |
| Typed errors | Not available | fn foo() raises CustomError |
| Never type | Not available | fn abort() -> Never |
| Compile-time expr | Implicit | comptime(expr) explicit |
| Trait methods | pass only |
... (no default) vs pass (empty) |
| Fn type conversion | Explicit | Non-raising β raising implicit |
| Copyable trait | Copyable, Movable |
Copyable refines Movable |
| Struct reflection | Not available | struct_field_count[T]() |
| Linear types | AnyType needs __del__() |
ImplicitlyDestructible trait |
Shared syntax (v25.7+):
- @fieldwise_init (not @value)
- var/deinit (not owned)
- Writable trait (not Stringable)
stable changelog | nightly changelog | breaking changes
Related: max-best-practices for MAX Serve deployment and inference.
Quick Decision Guide
| Goal | Category | Key Rules |
|---|---|---|
| Write safe code | Memory Safety | memory-ownership-transfer, memory-lifecycle-methods |
| Maximum performance | Performance | perf-vectorize, perf-parallelize (17,200x vs Python) |
| GPU acceleration | GPU Programming | gpu-fundamentals, gpu-tensor-core-sm90-sm100 |
| BLAS acceleration | C Interop | ffi-blas-accelerate (25-32x speedup) |
| Migrate from Python | Python Interop | python-type-conversion, python-minimize-crossing |
| Design APIs | Struct + Function | struct-trait-conformance, fn-argument-conventions |
Rule Categories
| Priority | Category | Count | Prefix |
|---|---|---|---|
| CRITICAL | Memory Safety & Ownership | 11 | memory- |
| CRITICAL | Type System | 9 | type- |
| CRITICAL | GPU Programming | 17 | gpu- |
| CRITICAL | C Interoperability | 10 | ffi- |
| HIGH | Struct Design | 9 | struct- |
| HIGH | Function Design | 7 | fn- |
| HIGH | Testing | 4 | test- |
| HIGH | Debugging | 2 | debug- |
| MEDIUM-HIGH | Error Handling | 5 | error- |
| MEDIUM | Performance Optimization | 31 | perf- |
| MEDIUM | Python Interoperability | 4 | python- |
| LOW | Advanced Metaprogramming | 5 | meta- |
Memory Safety (CRITICAL)
| Rule | Pattern |
|---|---|
memory-ownership-transfer |
Use ^ for ownership transfer |
memory-borrow-vs-copy |
Prefer read over copying |
memory-lifecycle-methods |
Implement __init__, __del__, etc. |
memory-safe-pointers |
Use OwnedPointer/ArcPointer over UnsafePointer |
memory-origin-tracking |
Explicit origin for UnsafePointer |
memory-atomic-refcounting |
MONOTONIC/RELEASE/ACQUIRE ordering |
Type System (CRITICAL)
| Rule | Pattern |
|---|---|
type-explicit-annotations |
Always annotate types (10-100x vs dynamic) |
type-simd-vectorization |
Use SIMD for numerics (4-16x speedup) |
type-register-passable |
@register_passable for small types |
type-trait-bounds |
[T: Trait1 & Trait2] bounds |
GPU Programming (CRITICAL)
| Rule | Pattern |
|---|---|
gpu-fundamentals |
Thread hierarchy, DeviceContext |
gpu-synchronization |
barrier(), syncwarp(), named barriers |
gpu-warp-specialization |
Separate warps for load/compute/epilogue |
gpu-tensor-core-sm90-sm100 |
WGMMA (SM90), UMMA (SM100) patterns |
gpu-tma-loading |
TMA hardware for 2D tile loads |
gpu-shared-memory-swizzle |
Swizzle patterns for bank-free access |
C Interoperability (CRITICAL)
| Rule | Pattern |
|---|---|
ffi-blas-accelerate |
Apple BLAS (25-32x matmul speedup) |
ffi-apple-amx-blas |
Apple AMX coprocessor via BLAS (2700 GFLOP/s) |
ffi-libc-functions |
external_call for C functions |
ffi-binary-data-patterns |
Read/write binary files |
Performance (MEDIUM)
| Rule | Pattern |
|---|---|
perf-vectorize |
vectorize function (4-16x SIMD speedup) |
perf-parallelize |
parallelize + SIMD (17,200x vs Python) |
perf-multiple-accumulators |
8 SIMD accumulators for ILP (1.5-2x speedup) |
perf-early-simd-exit |
Exit when all SIMD lanes complete (10-50% speedup) |
perf-algorithm-shortcuts |
Cardioid skip, trig identities (20-50% speedup) |
perf-polynomial-approximation |
SIMD trig via Taylor/Chebyshev (5-20x speedup) |
perf-precision-tradeoffs |
Float32 over Float64 (2x throughput) |
perf-memory-prefetch |
Software prefetching for memory-bound ops |
perf-memory-layout |
SoA, transpose for coalesced access |
perf-raw-pointers |
UnsafePointer for hot paths (2.3x vs List) |
Testing (HIGH)
| Rule | Pattern |
|---|---|
test-suite-patterns |
TestSuite.discover_tests[]().run() |
test-benchmark-patterns |
keep(), clobber_memory() |
File Structure
skills/mojo-best-practices/
βββ SKILL.md # Quick reference (this file)
βββ AGENTS.md # Auto-generated rule index
βββ metadata.json # Skill metadata
βββ CHANGELOG.md # Skill version history
βββ reference/ # Detailed reference docs
β βββ breaking-changes.md
βββ rules/ # Rules for both versions (122)
βββ memory-*.md
βββ gpu-*.md
βββ perf-*.md
βββ stable/ # Stable-preferred syntax
β βββ meta-alias-constants.md
βββ nightly/ # Nightly-only features (v0.26.1+)
βββ meta-comptime-values.md
βββ meta-comptime-expression.md
βββ type-trait-refinement.md
βββ type-linear-types.md
Local Implementation Notes
When using this skill in a project, agents should collect implementation notes locally within that project, not globally. This ensures project-specific learnings stay with the project.
Where to store notes:
your-project/
βββ IMPLEMENTATION_NOTES.md # Project-specific learnings
βββ .cursor/
β βββ rules/ # Project-specific rules
βββ ...
What to capture:
- Version-specific workarounds discovered
- Performance optimizations that worked for this codebase
- API quirks encountered
- Build configuration decisions
- Platform-specific adjustments (macOS/Linux/GPU)
Usage: Agents should check for and update IMPLEMENTATION_NOTES.md in the project root when discovering new patterns or resolving issues.
Navigation
- Need a specific rule? Check
rules/directory - Breaking changes? See reference/breaking-changes.md
- Full rule index? See AGENTS.md
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.