404kidwiz

azure-infra-engineer

6
0
# Install this skill:
npx skills add 404kidwiz/claude-supercode-skills --skill "azure-infra-engineer"

Install specific skill from multi-skill repository

# Description

Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).

# SKILL.md


name: azure-infra-engineer
description: Expert in Microsoft Azure cloud services, specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF).


Azure Infrastructure Engineer

Purpose

Provides Microsoft Azure cloud expertise specializing in Bicep/ARM templates, Enterprise Landing Zones, and Cloud Adoption Framework (CAF) implementations. Designs and deploys enterprise-grade Azure environments with governance, networking, and infrastructure as code.

When to Use

  • Deploying Azure resources using Bicep or ARM templates
  • Designing Hub-and-Spoke network topologies (Virtual WAN, ExpressRoute)
  • Implementing Azure Policy and Management Groups (Governance)
  • Migrating workloads to Azure (ASR, Azure Migrate)
  • Automating Azure DevOps pipelines for infrastructure
  • Configuring Azure Active Directory (Entra ID) RBAC and PIM

---

2. Decision Framework

IaC Tool Selection (Azure Context)

Tool Status Recommendation
Bicep Recommended Native, first-class support, concise syntax.
Terraform Alternative Best for multi-cloud strategies.
ARM Templates Legacy Verbose JSON. Avoid for new projects (compile Bicep instead).
PowerShell/CLI Scripting Use for ad-hoc tasks or pipeline glue, not state management.

Networking Architecture

What is the connectivity need?
โ”‚
โ”œโ”€ **Hub-and-Spoke** (Standard)
โ”‚  โ”œโ”€ Central Hub: Firewall, VPN Gateway, Bastion
โ”‚  โ””โ”€ Spokes: Workload VNets (Peered to Hub)
โ”‚
โ”œโ”€ **Virtual WAN** (Global Scale)
โ”‚  โ”œโ”€ Multi-region connectivity? โ†’ **Yes**
โ”‚  โ””โ”€ Branch-to-Branch (SD-WAN)? โ†’ **Yes**
โ”‚
โ””โ”€ **Private Access**
   โ”œโ”€ PaaS Services? โ†’ **Private Link / Private Endpoints**
   โ””โ”€ Service Endpoints? โ†’ Legacy (Use Private Link where possible)

Governance Strategy (CAF)

  1. Management Groups: Hierarchy for policy inheritance (Root > Geo > Landing Zones).
  2. Azure Policy: "Deny" non-compliant resources (e.g., only East US region).
  3. RBAC: Least privilege access via Entra ID Groups.
  4. Blueprints: Rapid deployment of compliant environments (being replaced by Template Specs + Stacks).

Red Flags โ†’ Escalate to security-engineer:
- Public access enabled on Storage Accounts or SQL Databases
- Management Ports (RDP/SSH) open to internet
- Subscription Owner permissions granted to individual users (Use Contributors/PIM)
- No cost controls/budgets configured

---

4. Core Workflows

Workflow 1: Bicep Resource Deployment

Goal: Deploy a secure Storage Account with Private Endpoint.

Steps:

  1. Define Bicep Module (storage.bicep)
    ```bicep
    param location string = resourceGroup().location
    param name string

    resource stg 'Microsoft.Storage/storageAccounts@2023-01-01' = {
    name: name
    location: location
    sku: { name: 'Standard_LRS' }
    kind: 'StorageV2'
    properties: {
    minimumTlsVersion: 'TLS1_2'
    supportsHttpsTrafficOnly: true
    publicNetworkAccess: 'Disabled' // Secure by default
    }
    }

    output id string = stg.id
    ```

  2. Main Deployment (main.bicep)
    bicep module storage './modules/storage.bicep' = { name: 'deployStorage' params: { name: 'stappprod001' } }

  3. Deploy via CLI
    bash az deployment group create --resource-group rg-prod --template-file main.bicep

---

Workflow 3: Landing Zone Setup (CAF)

Goal: Establish the foundational hierarchy.

Steps:

  1. Create Management Groups

    • MG-Root
      • MG-Platform (Identity, Connectivity, Management)
      • MG-LandingZones (Online, Corp)
      • MG-Sandbox (Playground)
  2. Assign Policies

    • Assign "Allowed Locations" to MG-Root.
    • Assign "Enable Azure Monitor" to MG-LandingZones.
  3. Deploy Hub Network

    • Deploy VNet in connectivity subscription.
    • Deploy Azure Firewall and VPN Gateway.

---

5. Anti-Patterns & Gotchas

โŒ Anti-Pattern 1: "ClickOps"

What it looks like:
- Creating resources manually in the Azure Portal.

Why it fails:
- Unrepeatable.
- Configuration drift.
- Disaster recovery is impossible (no code to redeploy).

Correct approach:
- Everything as Code: Even if prototyping, export the ARM template or write basic Bicep.

โŒ Anti-Pattern 2: One Giant Resource Group

What it looks like:
- rg-production contains VNets, VMs, Databases, and Web Apps for 5 different projects.

Why it fails:
- IAM nightmare (cannot grant access to Project A without Project B).
- Tagging and cost analysis becomes difficult.
- Risk of accidental deletion.

Correct approach:
- Lifecycle Grouping: Group resources that share a lifecycle (e.g., rg-network, rg-app1-prod, rg-app1-dev).

โŒ Anti-Pattern 3: Ignoring Naming Conventions

What it looks like:
- myvm1, test-storage, sql-server.

Why it fails:
- Cannot identify resource type, environment, or region from name.
- Name collisions (Storage accounts must be globally unique).

Correct approach:
- CAF Naming Standard: [Resource Type]-[Workload]-[Environment]-[Region]-[Instance]
- Example: st-myapp-prod-eus-001 (Storage Account, MyApp, Prod, East US, 001).

---

7. Quality Checklist

Governance:
- [ ] Naming: Resources follow CAF naming conventions.
- [ ] Tagging: Resources tagged with CostCenter, Environment, Owner.
- [ ] Policies: Azure Policy enforces compliance (e.g., allowed SKUs).

Security:
- [ ] Network: No public IPs on backend resources (VMs, DBs).
- [ ] Identity: Managed Identities used instead of Service Principals/Keys where possible.
- [ ] Encryption: CMK (Customer Managed Keys) enabled for sensitive data.

Reliability:
- [ ] Availability Zones: Critical resources deployed zone-redundant (ZRS).
- [ ] Backup: Azure Backup enabled for VMs and SQL.
- [ ] Locks: Resource Locks (CanNotDelete) on critical production resources.

Cost:
- [ ] Sizing: Resources right-sized based on metrics.
- [ ] Reservations: Reserved Instances purchased for steady workloads.
- [ ] Cleanup: Unused resources (orphaned disks/NICs) deleted.

Examples

Example 1: Multi-Subscription Landing Zone Setup

Scenario: A healthcare company needs to deploy a compliant landing zone for HIPAA-regulated workloads across three environments (dev, staging, prod).

Architecture:
1. Management Group Hierarchy: Root > Organization > Environments > Workloads
2. Network Design: Hub-and-spoke with Azure Firewall, separate VNets per environment
3. Policy Enforcement: Azure Policy to enforce HIPAA compliance (encryption, backup, private endpoints)
4. CI/CD Pipeline: Azure DevOps pipeline with approval gates for prod deployments

Key Components:
- Azure Firewall Manager for centralized policy
- Private DNS Zones for app-internal resolution
- Azure Backup with immutable vaults for compliance
- Cost Management tags for departmental chargebacks

Example 2: Zero-Trust Network Architecture

Scenario: A financial services firm needs to replace their VPN-based access with a Zero Trust architecture using Azure Private Link and Conditional Access.

Implementation:
1. Private Endpoints: All PaaS services accessed via Private Endpoints (SQL, Storage, Key Vault)
2. Identity-Based Access: Conditional Access policies requiring compliant device and MFA
3. Micro-segmentation: NSG rules denying all traffic by default, allowing only required flows
4. Monitoring: Azure Sentinel for security analytics and anomaly detection

Security Controls:
- Azure AD Conditional Access with device compliance
- Just-In-Time VM access for administration
- Azure Defender for Cloud threat protection
- Comprehensive audit logging to Log Analytics

Example 3: Cost-Optimized Dev/Test Environment

Scenario: A software company wants to reduce their Azure dev/test environment costs by 60% while maintaining developer productivity.

Optimization Strategy:
1. Auto-Shutdown: Dev VMs auto-shutdown evenings and weekends via Automation Runbooks
2. Reserved Capacity: Prod-like dev environments use Reserved Instances
3. Dev-Optimized SKUs: Development uses Dev/Test SKUs where available
4. Tagging and Governance: Required tags for cost allocation, orphaned resource cleanup

Cost Savings Results:
- 65% reduction in dev/test compute costs
- Automated cleanup of unused resources saving $2K/month
- Reserved Instance savings for stable environments
- Developer productivity maintained with auto-start capabilities

Best Practices

Infrastructure as Code

  • Everything as Code: Every resource defined in Bicep, never manual portal changes
  • Module Library: Create reusable Bicep modules for common patterns
  • Parameter Files: Separate parameter files per environment (dev, staging, prod)
  • GitOps Workflow: Infrastructure changes via PR and approval process
  • State Management: Use AzDO stateful pipelines or Terraform backend

Networking Excellence

  • Hub-and-Spoke Default: Standard architecture for most workloads
  • Private by Default: All PaaS access via Private Endpoints
  • DNS Planning: Private DNS Zones with VNet links, avoid host file modifications
  • Firewall Integration: Centralized threat protection with Azure Firewall
  • Hybrid Connectivity: ExpressRoute for production, VPN for secondary

Security Hardening

  • Least Privilege: RBAC with specific roles, avoid Subscription Owner
  • Managed Identities: Prefer over Service Principals with secrets
  • Secrets Management: Key Vault for all secrets, never environment variables
  • Encryption Everywhere: CMK for sensitive data, TLS 1.2+ everywhere
  • Network Isolation: NSG rules denying by default, allow-listing required traffic

Cost Management

  • Right-Sizing: Regular review of actual utilization vs allocated size
  • Reservation Planning: Identify stable workloads for Reserved Instances
  • Auto-Shutdown: Dev/test resources off during off-hours
  • Tagging Strategy: Required tags for cost center, environment, owner
  • Budget Alerts: Budget thresholds with alerts at 50%, 75%, 90%

Governance and Compliance

  • Policy as Guardrails: Azure Policy for prevention, not just detection
  • Management Groups: Hierarchy reflecting organizational structure
  • Blueprint Usage: Azure Blueprints for standard compliant environments
  • Monitoring Strategy: Centralized logging to Log Analytics workspace
  • Automation: Runbooks for routine operational tasks

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.