mojo-best-practices

Name: mojo-best-practices
Author: modularml

by @modularml in Tools

# Install this skill:

npx skills add modularml/agent-skills --skill "mojo-best-practices"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: mojo-best-practices
description: >
Mojo programming best practices from the official modular/modular repository.
Use when writing, reviewing, or optimizing Mojo code. Covers memory safety,
ownership patterns, GPU kernels (SM90/SM100 tensor cores), BLAS integration,
testing patterns, and performance optimization. Supports both stable (v25.7)
and nightly (v0.26.1).

Mojo Best Practices

Best practices for Mojo programming. 126 rules across 12 categories.

Version Support

This skill supports both stable and nightly Mojo versions:

Version	Mojo	Rules Directory
Stable	v25.7	`rules/` + `rules/stable/`
Nightly	v0.26.1	`rules/` + `rules/nightly/`

Detect your version: Run mojo --version or check pixi list | grep mojo

Key differences: Most breaking changes are now in both versions. Nightly-only features:

Feature	Stable (v25.7)	Nightly (v0.26.1+)
Constants	`alias`	`comptime` (preferred, `alias` deprecated)
Struct alignment	Not available	`@align(N)` decorator
Typed errors	Not available	`fn foo() raises CustomError`
Never type	Not available	`fn abort() -> Never`
Compile-time expr	Implicit	`comptime(expr)` explicit
Trait methods	`pass` only	`...` (no default) vs `pass` (empty)
Fn type conversion	Explicit	Non-raising → raising implicit
Copyable trait	`Copyable, Movable`	`Copyable` refines `Movable`
Struct reflection	Not available	`struct_field_count[T]()`
Linear types	`AnyType` needs `__del__()`	`ImplicitlyDestructible` trait

Shared syntax (v25.7+):
- @fieldwise_init (not @value)
- var/deinit (not owned)
- Writable trait (not Stringable)

stable changelog | nightly changelog | breaking changes

Related: max-best-practices for MAX Serve deployment and inference.

Quick Decision Guide

Goal	Category	Key Rules
Write safe code	Memory Safety	`memory-ownership-transfer`, `memory-lifecycle-methods`
Maximum performance	Performance	`perf-vectorize`, `perf-parallelize` (17,200x vs Python)
GPU acceleration	GPU Programming	`gpu-fundamentals`, `gpu-tensor-core-sm90-sm100`
BLAS acceleration	C Interop	`ffi-blas-accelerate` (25-32x speedup)
Migrate from Python	Python Interop	`python-type-conversion`, `python-minimize-crossing`
Design APIs	Struct + Function	`struct-trait-conformance`, `fn-argument-conventions`

Rule Categories

Priority	Category	Count	Prefix
CRITICAL	Memory Safety & Ownership	11	`memory-`
CRITICAL	Type System	9	`type-`
CRITICAL	GPU Programming	17	`gpu-`
CRITICAL	C Interoperability	10	`ffi-`
HIGH	Struct Design	9	`struct-`
HIGH	Function Design	7	`fn-`
HIGH	Testing	4	`test-`
HIGH	Debugging	2	`debug-`
MEDIUM-HIGH	Error Handling	5	`error-`
MEDIUM	Performance Optimization	31	`perf-`
MEDIUM	Python Interoperability	4	`python-`
LOW	Advanced Metaprogramming	5	`meta-`

Memory Safety (CRITICAL)

Rule	Pattern
`memory-ownership-transfer`	Use `^` for ownership transfer
`memory-borrow-vs-copy`	Prefer `read` over copying
`memory-lifecycle-methods`	Implement `__init__`, `__del__`, etc.
`memory-safe-pointers`	Use `OwnedPointer`/`ArcPointer` over `UnsafePointer`
`memory-origin-tracking`	Explicit origin for `UnsafePointer`
`memory-atomic-refcounting`	MONOTONIC/RELEASE/ACQUIRE ordering

Type System (CRITICAL)

Rule	Pattern
`type-explicit-annotations`	Always annotate types (10-100x vs dynamic)
`type-simd-vectorization`	Use SIMD for numerics (4-16x speedup)
`type-register-passable`	`@register_passable` for small types
`type-trait-bounds`	`[T: Trait1 & Trait2]` bounds

GPU Programming (CRITICAL)

Rule	Pattern
`gpu-fundamentals`	Thread hierarchy, DeviceContext
`gpu-synchronization`	`barrier()`, `syncwarp()`, named barriers
`gpu-warp-specialization`	Separate warps for load/compute/epilogue
`gpu-tensor-core-sm90-sm100`	WGMMA (SM90), UMMA (SM100) patterns
`gpu-tma-loading`	TMA hardware for 2D tile loads
`gpu-shared-memory-swizzle`	Swizzle patterns for bank-free access

C Interoperability (CRITICAL)

Rule	Pattern
`ffi-blas-accelerate`	Apple BLAS (25-32x matmul speedup)
`ffi-apple-amx-blas`	Apple AMX coprocessor via BLAS (2700 GFLOP/s)
`ffi-libc-functions`	`external_call` for C functions
`ffi-binary-data-patterns`	Read/write binary files

Performance (MEDIUM)

Rule	Pattern
`perf-vectorize`	`vectorize` function (4-16x SIMD speedup)
`perf-parallelize`	`parallelize` + SIMD (17,200x vs Python)
`perf-multiple-accumulators`	8 SIMD accumulators for ILP (1.5-2x speedup)
`perf-early-simd-exit`	Exit when all SIMD lanes complete (10-50% speedup)
`perf-algorithm-shortcuts`	Cardioid skip, trig identities (20-50% speedup)
`perf-polynomial-approximation`	SIMD trig via Taylor/Chebyshev (5-20x speedup)
`perf-precision-tradeoffs`	Float32 over Float64 (2x throughput)
`perf-memory-prefetch`	Software prefetching for memory-bound ops
`perf-memory-layout`	SoA, transpose for coalesced access
`perf-raw-pointers`	`UnsafePointer` for hot paths (2.3x vs List)

Testing (HIGH)

Rule	Pattern
`test-suite-patterns`	`TestSuite.discover_tests[]().run()`
`test-benchmark-patterns`	`keep()`, `clobber_memory()`

File Structure

skills/mojo-best-practices/
├── SKILL.md               # Quick reference (this file)
├── AGENTS.md              # Auto-generated rule index
├── metadata.json          # Skill metadata
├── CHANGELOG.md           # Skill version history
├── reference/             # Detailed reference docs
│   └── breaking-changes.md
└── rules/                 # Rules for both versions (122)
    ├── memory-*.md        
    ├── gpu-*.md           
    ├── perf-*.md          
    ├── stable/            # Stable-preferred syntax
    │   └── meta-alias-constants.md
    └── nightly/           # Nightly-only features (v0.26.1+)
        ├── meta-comptime-values.md
        ├── meta-comptime-expression.md
        ├── type-trait-refinement.md
        └── type-linear-types.md

Local Implementation Notes

When using this skill in a project, agents should collect implementation notes locally within that project, not globally. This ensures project-specific learnings stay with the project.

Where to store notes:

your-project/
├── IMPLEMENTATION_NOTES.md    # Project-specific learnings
├── .cursor/
│   └── rules/                 # Project-specific rules
└── ...

What to capture:
- Version-specific workarounds discovered
- Performance optimizations that worked for this codebase
- API quirks encountered
- Build configuration decisions
- Platform-specific adjustments (macOS/Linux/GPU)

Usage: Agents should check for and update IMPLEMENTATION_NOTES.md in the project root when discovering new patterns or resolving issues.

Need a specific rule? Check rules/ directory
Breaking changes? See reference/breaking-changes.md
Full rule index? See AGENTS.md

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.