mlops-engineer

by @404kidwiz in AI & LLM

# Install this skill:

npx skills add 404kidwiz/claude-supercode-skills --skill "mlops-engineer"

Install specific skill from multi-skill repository

# Description

Expert in Machine Learning Operations bridging data science and DevOps. Use when building ML pipelines, model versioning, feature stores, or production ML serving. Triggers include "MLOps", "ML pipeline", "model deployment", "feature store", "model versioning", "ML monitoring", "Kubeflow", "MLflow".

# SKILL.md

name: mlops-engineer
description: Expert in Machine Learning Operations bridging data science and DevOps. Use when building ML pipelines, model versioning, feature stores, or production ML serving. Triggers include "MLOps", "ML pipeline", "model deployment", "feature store", "model versioning", "ML monitoring", "Kubeflow", "MLflow".

MLOps Engineer

Purpose

Provides expertise in Machine Learning Operations, bridging data science and DevOps practices. Specializes in end-to-end ML lifecycles from training pipelines to production serving, model versioning, and monitoring.

When to Use

Building ML training and serving pipelines
Implementing model versioning and registry
Setting up feature stores
Deploying models to production
Monitoring model performance and drift
Automating ML workflows (CI/CD for ML)
Implementing A/B testing for models
Managing experiment tracking

Quick Start

Invoke this skill when:
- Building ML pipelines and workflows
- Deploying models to production
- Setting up model versioning and registry
- Implementing feature stores
- Monitoring production ML systems

Do NOT invoke when:
- Model development and training → use /ml-engineer
- Data pipeline ETL → use /data-engineer
- Kubernetes infrastructure → use /kubernetes-specialist
- General CI/CD without ML → use /devops-engineer

Decision Framework

ML Lifecycle Stage?
├── Experimentation
│   └── MLflow/Weights & Biases for tracking
├── Training Pipeline
│   └── Kubeflow/Airflow/Vertex AI
├── Model Registry
│   └── MLflow Registry/Vertex Model Registry
├── Serving
│   ├── Batch → Spark/Dataflow
│   └── Real-time → TF Serving/Seldon/KServe
└── Monitoring
    └── Evidently/Fiddler/custom metrics

Core Workflows

1. ML Pipeline Setup

Define pipeline stages (data prep, training, eval)
Choose orchestrator (Kubeflow, Airflow, Vertex)
Containerize each pipeline step
Implement artifact storage
Add experiment tracking
Configure automated retraining triggers

2. Model Deployment

Register model in model registry
Build serving container
Deploy to serving infrastructure
Configure autoscaling
Implement canary/shadow deployment
Set up monitoring and alerts

3. Model Monitoring

Define key metrics (latency, throughput, accuracy)
Implement data drift detection
Set up prediction monitoring
Create alerting thresholds
Build dashboards for visibility
Automate retraining triggers

Best Practices

Version everything: code, data, models, configs
Use feature stores for consistency between training and serving
Implement CI/CD specifically designed for ML workflows
Monitor data drift and model performance continuously
Use canary deployments for model rollouts
Keep training and serving environments consistent

Anti-Patterns

Anti-Pattern	Problem	Correct Approach
Manual deployments	Error-prone, slow	Automated ML CI/CD
Training-serving skew	Prediction errors	Feature stores
No model versioning	Can't reproduce or rollback	Model registry
Ignoring data drift	Silent degradation	Continuous monitoring
Notebook-to-production	Unmaintainable	Proper pipeline code

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.