acedergren

OCI Infrastructure as Code

1
1
# Install this skill:
npx skills add acedergren/oci-agent-skills --skill "OCI Infrastructure as Code"

Install specific skill from multi-skill repository

# Description

Use when writing Terraform for OCI, troubleshooting provider errors, managing state files, or implementing Resource Manager stacks. Covers terraform-provider-oci gotchas, resource lifecycle anti-patterns, state management mistakes, authentication issues, and OCI Landing Zones. Keywords: terraform state, provider authentication, resource already exists, destroy failed, instance principal terraform.

# SKILL.md


name: OCI Infrastructure as Code
description: Use when writing Terraform for OCI, troubleshooting provider errors, managing state files, or implementing Resource Manager stacks. Covers terraform-provider-oci gotchas, resource lifecycle anti-patterns, state management mistakes, authentication issues, and OCI Landing Zones. Keywords: terraform state, provider authentication, resource already exists, destroy failed, instance principal terraform.
version: 2.0.0


OCI Infrastructure as Code - Expert Knowledge

πŸ—οΈ IMPORTANT: Use OCI Landing Zone Terraform Modules

Do NOT Reinvent the Wheel

❌ WRONG Approach:

# Writing Terraform from scratch for every resource
resource "oci_identity_compartment" "prod" { ... }
resource "oci_core_vcn" "main" { ... }
resource "oci_identity_policy" "policies" { ... }
# Result: Unmaintainable, inconsistent, no governance

βœ… RIGHT Approach: Use Official OCI Landing Zone Terraform Modules

# Use official OCI Landing Zone modules
module "landing_zone" {
  source  = "oracle-terraform-modules/landing-zone/oci"
  version = "~> 2.0"

  # Infrastructure configuration
  compartments_configuration = { ... }
  network_configuration = { ... }
  security_configuration = { ... }
}

Why Use Landing Zone Modules:
- βœ… Battle-tested: Thousands of OCI customers
- βœ… Compliant: CIS OCI Foundations Benchmark aligned
- βœ… Maintained: Oracle updates for API changes
- βœ… Comprehensive: Includes IAM, networking, security, logging
- βœ… Reusable: Consistent patterns across environments

Official Resources:
- OCI Landing Zone Terraform Modules
- OCI Resource Manager Stacks

When to Write Custom Terraform (this skill's guidance):
- Application-specific resources not covered by landing zone
- Extending landing zone modules
- Special requirements not in reference architecture


⚠️ OCI CLI/API Knowledge Gap

You don't know OCI CLI commands or OCI API structure.

Your training data has limited and outdated knowledge of:
- OCI Terraform provider syntax (updates frequently)
- OCI API endpoints and resource schemas
- terraform-provider-oci specific arguments and data sources
- Resource Manager stack operations
- Latest provider features and breaking changes

When OCI operations are needed:
1. Use exact Terraform examples from this skill's references
2. Do NOT guess OCI provider resource arguments
3. Do NOT assume AWS/Azure Terraform patterns work in OCI
4. Reference landing-zones skill for module usage

What you DO know:
- General Terraform concepts and HCL syntax
- State management principles
- Infrastructure as Code best practices

This skill bridges the gap by providing current OCI-specific Terraform patterns and gotchas.


You are an OCI Terraform expert. This skill provides knowledge Claude lacks: provider-specific gotchas, state management anti-patterns, resource lifecycle traps, and OCI-specific IaC operational knowledge.

NEVER Do This

❌ NEVER hardcode OCIDs in Terraform (breaks portability)

# WRONG - breaks when moving between regions/compartments
resource "oci_core_instance" "web" {
  compartment_id = "ocid1.compartment.oc1..aaaaaa..."  # Hardcoded!
  subnet_id      = "ocid1.subnet.oc1.phx.bbbbbb..."     # Hardcoded!
}

# RIGHT - use variables or data sources
resource "oci_core_instance" "web" {
  compartment_id = var.compartment_ocid
  subnet_id      = data.oci_core_subnet.existing.id
}

❌ NEVER use preserve_boot_volume = true in dev/test (cost trap)

# WRONG - orphans boot volumes when instance destroyed ($50+/month per instance)
resource "oci_core_instance" "dev" {
  preserve_boot_volume = true  # Default behavior!
}

# RIGHT - explicit cleanup in dev/test
resource "oci_core_instance" "dev" {
  preserve_boot_volume = false
}

Cost impact: Dev team with 10 test instances Γ— $5/volume/month = $50/month wasted on orphaned volumes

❌ NEVER forget lifecycle blocks for critical resources

# WRONG - accidental destroy can delete production database
resource "oci_database_autonomous_database" "prod" {
  # No protection!
}

# RIGHT - prevent accidental destruction
resource "oci_database_autonomous_database" "prod" {
  lifecycle {
    prevent_destroy = true
    ignore_changes  = [defined_tags]  # Ignore tag changes from console
  }
}

❌ NEVER mix regional and AD-specific resources (portability trap)

# WRONG - hardcoded AD breaks multi-region deployment
resource "oci_core_instance" "web" {
  availability_domain = "fMgC:US-ASHBURN-AD-1"  # Tenant-specific!
}

# RIGHT - query AD dynamically
data "oci_identity_availability_domains" "ads" {
  compartment_id = var.tenancy_ocid
}

resource "oci_core_instance" "web" {
  availability_domain = data.oci_identity_availability_domains.ads.availability_domains[0].name
}

❌ NEVER store state file in local filesystem for teams

# WRONG - no locking, no collaboration
terraform {
  backend "local" {}
}

# RIGHT - use OCI Object Storage with locking
terraform {
  backend "s3" {
    bucket   = "terraform-state"
    key      = "prod/terraform.tfstate"
    region   = "us-phoenix-1"
    endpoint = "https://namespace.compat.objectstorage.us-phoenix-1.oraclecloud.com"

    skip_region_validation      = true
    skip_credentials_validation = true
    skip_metadata_api_check     = true
    use_path_style              = true
  }
}

❌ NEVER use count for resources that shouldn't be replaced on reorder

# WRONG - reordering list recreates ALL resources
resource "oci_core_instance" "web" {
  count = length(var.instance_names)
  display_name = var.instance_names[count.index]
}

# If instance_names changes from ["web1", "web2", "web3"] to ["web0", "web1", "web2", "web3"]
# Terraform RECREATES all instances!

# RIGHT - use for_each with stable keys
resource "oci_core_instance" "web" {
  for_each = toset(var.instance_names)
  display_name = each.value
}

OCI Provider Gotchas

Authentication Hierarchy (Often Confusing)

Provider authentication precedence:
1. Explicit provider block credentials
2. TF_VAR_* environment variables
3. ~/.oci/config file (DEFAULT profile)
4. Instance Principal (if auth = "InstancePrincipal")

Common mistake: Setting environment variables but provider block overrides them silently.

Instance Principal for Terraform on OCI Compute

# In provider.tf
provider "oci" {
  auth   = "InstancePrincipal"
  region = var.region
}

# Dynamic group matching rule:
# "ALL {instance.compartment.id = '<compartment-ocid>'}"

# IAM policy:
# "Allow dynamic-group terraform-instances to manage all-resources in tenancy"

Critical: Instance must be in dynamic group BEFORE Terraform runs, or authentication fails with cryptic error: "authorization failed or requested resource not found"

Resource Already Exists Errors

Error: 409-Conflict, Resource already exists

Cause: Resource exists in OCI but not in state file.

Solution:

# Import existing resource into state
terraform import oci_core_vcn.main ocid1.vcn.oc1.phx.xxxxx

# Then run plan/apply as normal
terraform plan

Prevention: Always use terraform import for existing infrastructure before managing with Terraform.

State Management Anti-Patterns

Problem: State Drift

Symptoms: Terraform wants to change/destroy resources that were modified outside Terraform (console, API, CLI).

Detection:

terraform plan  # Shows unexpected changes
terraform show  # Compare state to actual infrastructure

Solutions:

Option 1: Refresh state (safe)

terraform refresh  # Updates state to match reality

Option 2: Import changes (if new resources)

terraform import <resource_type>.<name> <ocid>

Option 3: Ignore changes in lifecycle

lifecycle {
  ignore_changes = [defined_tags, freeform_tags]  # Ignore console tag edits
}

Problem: State File Corruption

Symptoms: terraform plan fails with "state file corrupted" or "version mismatch"

Recovery:

# 1. Make backup
cp terraform.tfstate terraform.tfstate.backup

# 2. Try state repair
terraform state pull > recovered.tfstate
mv recovered.tfstate terraform.tfstate

# 3. If that fails, restore from Object Storage versioning
# Or reconstruct with imports (last resort)

Prevention: Use Object Storage backend with versioning enabled

Resource Lifecycle Traps

Destroy Failures (Common with Dependencies)

Error: Resource still in use

Example: Can't destroy VCN because subnet still exists, can't destroy subnet because instances still attached.

Solution:

# 1. Visualize dependencies
terraform graph | dot -Tpng > graph.png

# 2. Destroy in reverse order
terraform destroy -target=oci_core_instance.web
terraform destroy -target=oci_core_subnet.private
terraform destroy -target=oci_core_vcn.main

# Or use depends_on explicitly:
resource "oci_core_vcn" "main" {
  # ...
}

resource "oci_core_subnet" "private" {
  vcn_id = oci_core_vcn.main.id
  # depends_on is implicit via vcn_id reference
}

Timeouts for Long-Running Resources

# Database provisioning takes 15-30 minutes
resource "oci_database_autonomous_database" "prod" {
  # ... configuration ...

  timeouts {
    create = "60m"  # Default 20m often not enough
    update = "60m"
    delete = "30m"
  }
}

# Compute instance usually fast, but can timeout on capacity issues
resource "oci_core_instance" "web" {
  # ... configuration ...

  timeouts {
    create = "30m"  # Allow retries on "out of capacity"
  }
}

OCI Landing Zones

What: Pre-built Terraform templates for enterprise OCI architectures

Repository: github.com/oracle-quickstart/oci-landing-zones

Use when:
- Starting new OCI tenancy (greenfield)
- Need CIS OCI Foundations Benchmark compliance
- Want security-hardened baseline
- Multi-environment (dev/test/prod) setup

DON'T use when:
- Brownfield (existing infrastructure) - too opinionated
- Simple single-app deployment - overkill

Key patterns:
- Hub-and-spoke networking
- Centralized logging/monitoring
- Security zones and bastion hosts
- IAM baseline with groups/policies

Cost Optimization for IaC

Use Flex Shapes (50% savings)

# EXPENSIVE - fixed shape
resource "oci_core_instance" "web" {
  shape = "VM.Standard2.4"  # 4 OCPUs, 60GB RAM, $218/month
}

# CHEAPER - flexible shape
resource "oci_core_instance" "web" {
  shape = "VM.Standard.E4.Flex"
  shape_config {
    ocpus         = 4
    memory_in_gbs = 60
  }
  # Cost: (4 Γ— $0.03 + 60 Γ— $0.0015) Γ— 730 = $153/month (30% savings)
}

Tag Everything for Cost Tracking

# Define locals for consistent tagging
locals {
  common_tags = {
    "CostCenter"  = "Engineering"
    "Environment" = var.environment
    "ManagedBy"   = "Terraform"
    "Project"     = var.project_name
  }
}

resource "oci_core_instance" "web" {
  freeform_tags = merge(
    local.common_tags,
    {
      "Component" = "WebServer"
    }
  )
}

Benefit: Cost reporting by CostCenter, Environment, Project in OCI Console

Progressive Loading References

OCI Terraform Patterns

WHEN TO LOAD oci-terraform-patterns.md:
- Setting up provider configuration (multi-region, auth methods)
- Resource Manager stack operations via CLI
- Common resource patterns (VCN, compute, ADB)
- State management with Object Storage backend
- Landing Zone module usage examples

Do NOT load for:
- Quick provider gotchas (NEVER list above)
- Understanding when to use Landing Zone (covered above)
- Lifecycle management patterns (covered above)


When to Use This Skill

  • Writing Terraform: provider configuration, resource dependencies, lifecycle
  • State management: drift, corruption, import/export
  • Troubleshooting: authentication failures, "resource already exists", destroy failures
  • OCI Landing Zones: when to use, how to customize
  • Cost optimization: Flex shapes, tagging strategies
  • Production: prevent_destroy, ignore_changes, timeouts

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.