Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add mjunaidca/mjs-agent-skills --skill "kubernetes"
Install specific skill from multi-skill repository
# Description
|-
# SKILL.md
name: kubernetes
description: |-
Production-grade Kubernetes manifests and debugging for containerized applications.
This skill should be used when users ask to deploy to Kubernetes, create K8s manifests,
containerize for K8s, set up Deployments/Services/Jobs/StatefulSets/CronJobs, create
namespaces with resource quotas, set up multi-team isolation, configure ResourceQuota/
LimitRange, secure with RBAC (ServiceAccount, Role, RoleBinding), configure init
containers (model download, db wait, migrations), set up sidecars (logging, metrics),
or debug pods (CrashLoopBackOff, logs, exec, describe, events). Auto-detects from
Dockerfile/code, generates hardened manifests with educational comments. CKAD-aligned.
hooks:
PreToolUse:
- matcher: "Bash"
hooks:
- type: command
command: "bash \"$CLAUDE_PROJECT_DIR\"/.claude/hooks/verify-kubectl-context.sh"
Kubernetes
Production-grade K8s manifests with security-first defaults and educational comments.
Resource Detection & Adaptation
Before generating manifests, detect the target environment:
# Detect node resources
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}: {.status.capacity.memory}, {.status.capacity.cpu}{"\n"}{end}'
# Detect if Docker Desktop (local) or real cluster
kubectl get nodes -o jsonpath='{.items[0].metadata.labels.node\.kubernetes\.io/instance-type}' 2>/dev/null || echo "local"
# Detect available resources
kubectl describe nodes | grep -A 5 "Allocated resources"
Adapt configurations based on detection:
| Detected Environment | Profile | Default Limits | Agent Action |
|---|---|---|---|
| Docker Desktop < 6GB | Minimal | 128Mi-256Mi | Warn, reduce replicas |
| Docker Desktop 6-10GB | Standard | 256Mi-512Mi | Normal deployment |
| Cloud/Real cluster | Production | Based on node size | Full features |
Agent Behavior
- Detect cluster type and resources before generating manifests
- Adapt resource requests/limits to cluster capacity
- Warn if requested workload exceeds available resources
- Calculate safe limits:
(node_memory * 0.7) / expected_pod_count
Adaptive Resource Templates
Local/Constrained (< 6GB allocatable):
resources:
requests:
memory: 128Mi
cpu: 100m
limits:
memory: 256Mi
cpu: 500m
Standard (6-16GB allocatable):
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 1000m
Production (> 16GB or cloud):
resources:
requests:
memory: 512Mi
cpu: 250m
limits:
memory: 1Gi
cpu: 2000m
Pre-Deployment Validation
Before applying manifests, agent should verify:
# Check if deployment would exceed node capacity
kubectl get nodes -o jsonpath='{.items[0].status.allocatable.memory}'
If insufficient: warn user and suggest scaling down or increasing Docker Desktop resources.
What This Skill Does
Analysis & Detection:
- Auto-detects from Dockerfile: ports, health endpoints, resources
- Identifies workload type from project structure
- Reads existing manifests to understand patterns
- Detects GPU requirements from dependencies
Generation:
- Creates production-hardened manifests (non-root, read-only, resource limits)
- Generates all supporting resources (Service, ConfigMap, HPA, PDB)
- Creates namespace governance (ResourceQuota, LimitRange, NetworkPolicy)
- Supports multi-team isolation with environment progression (dev → staging → prod)
- Adds educational comments explaining WHY each config choice
- Outputs ArgoCD-compatible directory structure
Validation:
- Verifies kubectl context exists
- Creates namespace if needed
- Deploys to local cluster (kind/minikube)
- Confirms pods are running before delivering
Security:
- Non-root user by default (runAsNonRoot: true)
- Read-only root filesystem
- No privilege escalation
- Dropped capabilities
- Resource limits always set
- Unprivileged ports only (>=1024) - privileged ports (<1024) require root
What This Skill Does NOT Do
- Generate Helm charts (document in references for future)
- Create Kustomize overlays (document in references for future)
- Handle Dapr sidecar injection (separate skill)
- Deploy Kafka/Strimzi operators (separate skill)
- Generate ArgoCD Application CRDs (separate skill)
Before Implementation
Gather context to ensure successful implementation:
| Source | Gather |
|---|---|
| Codebase | Dockerfile, existing manifests, port/health patterns |
| Conversation | Target environment, namespace, special requirements |
| Skill References | Security contexts, health probes, resource limits |
| User Guidelines | Cluster conventions, naming standards |
Required Clarifications
After auto-detection, confirm with user if ambiguous:
| Question | When to Ask |
|---|---|
| Target environment | "Deploying to local (kind/minikube) or remote cluster?" |
| Namespace | "Use existing namespace or create new?" |
| Image availability | "Is image in registry or needs to be built/loaded?" |
| Service exposure | "Internal only (ClusterIP) or external access needed?" |
| Namespace governance | "Need ResourceQuota/LimitRange for resource isolation?" |
| Multi-team setup | "Single team or multi-team with namespace isolation?" |
| Environment progression | "Creating dev/staging/prod namespaces with quota progression?" |
Pre-flight Checks (CRITICAL)
Before generating manifests, verify:
# 1. Cluster access
kubectl cluster-info
# 2. Current context
kubectl config current-context
# 3. Target namespace (create if needed)
kubectl get namespace $NAMESPACE || kubectl create namespace $NAMESPACE
# 4. Image exists (or build it)
docker images | grep $IMAGE_NAME || docker build -t $IMAGE_NAME .
# 5. For local clusters: load image
kind load docker-image $IMAGE_NAME # or minikube image load
If any check fails → stop and report. Don't generate manifests for broken state.
Auto-Detection Matrix
From Dockerfile
| Detect | How | Example |
|---|---|---|
| Port | EXPOSE instruction | EXPOSE 8000 → containerPort: 8000 |
| Health | CMD with health endpoint | uvicorn → /health or /healthz |
| User | USER instruction | USER 1000 → runAsUser: 1000 |
| Workdir | WORKDIR instruction | Context for volume mounts |
Port Selection (CRITICAL for Security)
Privileged ports (<1024) conflict with runAsNonRoot: true.
| Detected Port | Action |
|---|---|
| 80, 443 | ⚠️ Use unprivileged variant (nginx-unprivileged:8080) or remap |
| 8080, 8000, 3000+ | ✅ Compatible with non-root |
Common remappings:
| Standard Image | Security-Compatible Alternative |
|----------------|--------------------------------|
| nginx (port 80) | nginxinc/nginx-unprivileged (port 8080) |
| httpd (port 80) | Configure Listen 8080 or use unprivileged image |
| redis (port 6379) | ✅ Already unprivileged |
| postgres (port 5432) | ✅ Already unprivileged |
Service abstracts this: Service port: 80 → targetPort: 8080 keeps external API stable.
From Code
| Detect | How | Example |
|---|---|---|
| Framework health | Route definitions | FastAPI /health, Express /healthz |
| Readiness | DB connection check | /health/ready with DB ping |
| Startup time | Heavy imports | ML models → startupProbe needed |
Workload Type Decision
Is this a one-time task that completes?
→ Job (or CronJob if scheduled)
Does it need stable network identity or ordered deployment?
→ StatefulSet
Must run on every node?
→ DaemonSet
Otherwise → Deployment (default)
Workflow
1. PRE-FLIGHT
- Verify kubectl context
- Check namespace exists
- Verify image exists or build it
↓
2. ANALYZE PROJECT
- Read Dockerfile for EXPOSE, HEALTHCHECK, USER
- Scan code for health endpoints
- Check existing k8s/ directory
- Detect GPU requirements (torch, tensorflow)
↓
3. DETERMINE WORKLOAD TYPE
- Deployment (default)
- Job/CronJob (batch processing)
- StatefulSet (databases, ordered)
- DaemonSet (node-level agents)
↓
4. GENERATE MANIFESTS
- Deployment/Job/StatefulSet with hardened security
- Service (ClusterIP, NodePort, or LoadBalancer)
- ConfigMap for non-secret config
- HPA if autoscaling needed
- PDB for availability
- All with educational comments
↓
5. VALIDATE
- kubectl apply --dry-run=server
- kubectl apply -n $NAMESPACE
- kubectl wait --for=condition=Ready pod
- kubectl logs to verify startup
↓
6. DELIVER
- Files in k8s/base/ directory
- Summary of what was created
- Next steps for production
Generated Directory Structure
k8s/
├── base/ # Raw manifests (ArgoCD-compatible)
│ ├── namespace.yaml # Optional, if new namespace
│ ├── resourcequota.yaml # Namespace-wide resource caps
│ ├── limitrange.yaml # Per-container defaults and bounds
│ ├── networkpolicy.yaml # Namespace isolation rules
│ ├── deployment.yaml # Or job.yaml, statefulset.yaml
│ ├── service.yaml # ClusterIP by default
│ ├── configmap.yaml # Non-secret configuration
│ ├── hpa.yaml # If autoscaling enabled
│ ├── pdb.yaml # Pod Disruption Budget
│ └── kustomization.yaml # For future Kustomize use
└── README.md # Deployment instructions
Manifest Patterns
Deployment (Default)
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${APP_NAME}
labels:
# Standard K8s labels (see references/labels-annotations.md)
app.kubernetes.io/name: ${APP_NAME}
app.kubernetes.io/instance: ${APP_NAME}-${ENV}
app.kubernetes.io/version: "${VERSION}"
app.kubernetes.io/component: api # or worker, frontend
app.kubernetes.io/part-of: ${PROJECT}
app.kubernetes.io/managed-by: kubectl
spec:
replicas: 2 # WHY: Minimum for availability during rolling updates
selector:
matchLabels:
app.kubernetes.io/name: ${APP_NAME}
template:
metadata:
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
# WHY: Security hardening - never run as root
securityContext:
runAsNonRoot: true
runAsUser: 1000
runAsGroup: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: ${APP_NAME}
image: ${IMAGE}:${TAG}
# WHY: Never use :latest - breaks reproducibility
imagePullPolicy: IfNotPresent
ports:
# WHY: Port must be >=1024 for runAsNonRoot (privileged ports need root)
# Use Service port:80 → targetPort:8080 to expose standard ports externally
- containerPort: ${PORT} # Must be >=1024 (e.g., 8080, 8000, 3000)
protocol: TCP
# WHY: Container-level security context
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
# WHY: Prevent resource starvation, enable HPA
resources:
requests:
cpu: "100m" # 0.1 CPU cores
memory: "128Mi"
limits:
cpu: "500m" # 0.5 CPU cores
memory: "512Mi"
# WHY: K8s restarts if app deadlocks
livenessProbe:
httpGet:
path: /health/live
port: ${PORT}
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
# WHY: Only route traffic when ready
readinessProbe:
httpGet:
path: /health/ready
port: ${PORT}
initialDelaySeconds: 5
periodSeconds: 10
# WHY: Slow-starting apps (ML models) need longer startup
startupProbe:
httpGet:
path: /health/live
port: ${PORT}
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # 5 minutes to start
# WHY: Graceful shutdown for in-flight requests
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]
# WHY: Allow time for graceful shutdown
terminationGracePeriodSeconds: 30
Service
apiVersion: v1
kind: Service
metadata:
name: ${APP_NAME}
labels:
app.kubernetes.io/name: ${APP_NAME}
spec:
# WHY: ClusterIP is safest default - internal only
# Use NodePort for dev/testing, LoadBalancer for prod external access
type: ClusterIP
ports:
# WHY: Service abstracts internal port - clients connect to :80, Pod runs on :8080
# This allows standard external ports while container runs unprivileged
- port: 80 # WHY: Service port (what clients connect to)
targetPort: ${PORT} # WHY: Pod port (>=1024, e.g., 8080)
protocol: TCP
name: http
selector:
# CRITICAL: Must EXACTLY match Pod template labels from Deployment
# Mismatch = zero endpoints = Service routes to nothing
app.kubernetes.io/name: ${APP_NAME}
Verify Service→Pod connection: kubectl get endpoints ${APP_NAME}
- Shows Pod IPs if selector matches
- Shows <none> if selector MISMATCHES Pod labels
Security Context (Always Applied)
See references/security-contexts.md for full patterns.
# Pod level
securityContext:
runAsNonRoot: true # WHY: Never run as root
runAsUser: 1000 # WHY: Consistent non-root UID
runAsGroup: 1000 # WHY: Consistent GID
fsGroup: 1000 # WHY: Volume permissions
seccompProfile:
type: RuntimeDefault # WHY: Block dangerous syscalls
# Container level
securityContext:
allowPrivilegeEscalation: false # WHY: Prevent root escalation
readOnlyRootFilesystem: true # WHY: Immutable container
capabilities:
drop: ["ALL"] # WHY: Minimal capabilities
Output Checklist
Before delivering, verify:
Pre-flight
- [ ] kubectl context is valid
- [ ] Namespace exists or was created
- [ ] Image exists locally or in registry
- [ ] For kind/minikube: image loaded into cluster
Manifests
- [ ] All manifests have
app.kubernetes.io/*labels - [ ] Security context applied (runAsNonRoot, readOnlyRootFilesystem)
- [ ] containerPort >= 1024 (privileged ports incompatible with runAsNonRoot)
- [ ] Resource requests AND limits defined
- [ ] Liveness and readiness probes configured
- [ ] No hardcoded secrets (use Secret references or env vars)
Namespace Governance (if applicable)
- [ ] ResourceQuota sets namespace-wide CPU/memory/pod limits
- [ ] LimitRange provides default requests/limits for containers
- [ ] LimitRange max prevents single container from consuming quota
- [ ] NetworkPolicy isolates namespace (default-deny + explicit allows)
- [ ] Monitoring namespace allowed to scrape metrics
Validation
- [ ]
kubectl apply --dry-run=serverpasses - [ ] Deployed to cluster successfully
- [ ] Pods reach Running state
- [ ] Health endpoints respond
- [ ] Service has endpoints (
kubectl get endpointsshows Pod IPs, not<none>)
Documentation
- [ ] Comments explain WHY for each config choice
- [ ] README.md with deployment instructions
Reference Files
Always Read First
| File | Purpose |
|---|---|
references/security-contexts.md |
CRITICAL: Hardened security patterns |
references/health-probes.md |
CRITICAL: Liveness/readiness/startup |
references/resource-limits.md |
CRITICAL: CPU/memory guidance |
references/namespace-governance.md |
CRITICAL: ResourceQuota, LimitRange, NetworkPolicy, multi-team isolation |
Debugging & Operations
| File | When to Read |
|---|---|
references/debugging-workflow.md |
CRITICAL: CrashLoopBackOff, command safety, logs, exec, debug containers |
references/deployment-gotchas.md |
CRITICAL: Architecture mismatch, ImagePull failures, pre-deploy validation, Helm gotchas |
references/networking-patterns.md |
DEBUGGING: Service has no endpoints, selector mismatch, DNS issues |
references/control-plane.md |
DEBUGGING: When deployments fail, pods stuck, rollback needed |
Workload-Specific
| File | When to Read |
|---|---|
references/workload-types.md |
Choosing Deployment vs Job vs StatefulSet |
references/init-sidecar-patterns.md |
Init containers (model download, db wait), sidecars (logging, metrics) |
references/autoscaling-patterns.md |
HPA, custom metrics, KEDA |
references/gpu-workloads.md |
AI/ML workloads with GPU |
references/keda-patterns.md |
Event-driven scale-to-zero |
Infrastructure
| File | When to Read |
|---|---|
references/networking-patterns.md |
Service types, Ingress, mesh |
references/storage-patterns.md |
PVC, ephemeral, shared storage |
references/configmap-patterns.md |
ConfigMap creation, env vars, volumes, hot-reload |
references/secrets-patterns.md |
ESO, Sealed Secrets, K8s Secrets |
references/rbac-patterns.md |
SECURITY: ServiceAccount, Role, RoleBinding, least privilege |
references/labels-annotations.md |
Standard labels, ArgoCD compat |
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.