Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes...
npx skills add sgcarstrends/sgcarstrends --skill "monitoring"
Install specific skill from multi-skill repository
# Description
Monitor AWS resources, debug production issues, check Lambda logs, and implement structured logging. Use when investigating errors, checking CloudWatch logs, debugging deployment failures, improving observability, or setting up alarms.
# SKILL.md
name: monitoring
description: Monitor AWS resources, debug production issues, check Lambda logs, and implement structured logging. Use when investigating errors, checking CloudWatch logs, debugging deployment failures, improving observability, or setting up alarms.
allowed-tools: Bash, Read, Edit, Write, Grep
Monitoring Skill
Combines AWS monitoring, CloudWatch logs, and error tracking.
Viewing Logs
SST Console
npx sst console --stage production
npx sst logs --stage production --function api --tail
npx sst logs --stage production --function api --filter "ERROR" --since 1h
AWS CLI
# Tail logs
aws logs tail "/aws/lambda/sgcarstrends-api-production" --follow
# Filter logs
aws logs filter-log-events \
--log-group-name "/aws/lambda/sgcarstrends-api-production" \
--filter-pattern "ERROR" \
--start-time $(date -u -d '1 hour ago' +%s)000
CloudWatch Metrics
# Lambda errors
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Errors \
--dimensions Name=FunctionName,Value=sgcarstrends-api-production \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Sum
# Lambda duration
aws cloudwatch get-metric-statistics \
--namespace AWS/Lambda \
--metric-name Duration \
--dimensions Name=FunctionName,Value=sgcarstrends-api-production \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%S) \
--period 300 \
--statistics Average,Maximum
CloudWatch Insights Queries
-- Find all errors
fields @timestamp, @message, level, error.message
| filter level = "error"
| sort @timestamp desc
| limit 100
-- Count errors by type
fields error.name
| filter level = "error"
| stats count() by error.name
-- Slow requests
fields @timestamp, @duration
| filter @duration > 1000
| sort @duration desc
-- Error rate over time
fields @timestamp
| filter level = "error"
| stats count() as ErrorCount by bin(5m)
Structured Logging
// packages/utils/src/logger.ts
import pino from "pino";
export const log = {
info: (message: string, data?: Record<string, unknown>) => logger.info(data, message),
error: (message: string, error: Error, data?: Record<string, unknown>) => {
logger.error({ ...data, error: { message: error.message, stack: error.stack } }, message);
},
warn: (message: string, data?: Record<string, unknown>) => logger.warn(data, message),
};
// Usage
log.info("Fetching cars", { month: "2024-01" });
log.error("Failed to fetch cars", error, { month: "2024-01" });
CloudWatch Alarms
// infra/alarms.ts
new cloudwatch.Alarm(stack, "ApiHighErrorRate", {
metric: api.metricErrors(),
threshold: 10,
evaluationPeriods: 2,
alarmDescription: "API has high error rate",
}).addAlarmAction(new cloudwatch.SnsAction(alarmTopic));
Health Checks
# Test API
curl -f https://api.sgcarstrends.com/health || echo "API unhealthy"
# Test web
curl -f https://sgcarstrends.com || echo "Web unhealthy"
# Database connectivity
psql $DATABASE_URL -c "SELECT 1" || echo "Database unreachable"
Debugging Production Issues
# 1. Check recent errors
npx sst logs --stage production --function api --filter "ERROR" --since 1h
# 2. Get Lambda metrics
aws cloudwatch get-metric-statistics --namespace AWS/Lambda --metric-name Errors ...
# 3. Test endpoint directly
curl -v https://api.sgcarstrends.com/health
# 4. Check stack events
aws cloudformation describe-stack-events --stack-name sgcarstrends-api-production --max-items 50
Common Issues
| Issue | Investigation | Solution |
|---|---|---|
| High latency | Check duration metrics, slow queries | Increase memory, optimize queries, add caching |
| High error rate | Check error logs, external services | Fix bugs, add error handling, check rate limits |
| Cold starts | Check init duration, package size | Provisioned concurrency, reduce bundle, ARM |
Best Practices
- Structured Logging: Use JSON format with context
- Log Levels: DEBUG for dev, INFO+ for prod
- Don't Log Secrets: Never log passwords, tokens, keys
- Set Alarms: Monitor error rate and latency
- Log Retention: 7-30 days to balance cost/debugging
References
- CloudWatch: https://docs.aws.amazon.com/cloudwatch
- Lambda Monitoring: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions.html
- Pino Logger: https://getpino.io
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.