Terraform is powerful. Terraform at scale across multiple AWS accounts and environments without a wrapper is painful. This post isn’t another “what is Terragrunt” intro — it’s the actual pattern I use in production to manage infrastructure across isolated AWS accounts with a single codebase.

The Problem with Raw Terraform at Scale

If you’ve managed multi-environment Terraform without tooling, you’ve hit the wall. The usual approaches:

  • Copy-paste modules per environment — Works until you have 6 environments and a bug fix requires touching 6 directories.
  • Workspaces — Better, but state isolation is weak and the mental model gets confusing fast. State files share a backend config; one terraform destroy with the wrong workspace selected is a very bad day.
  • terragrunt — What I use. Keeps modules DRY, enforces consistent backend config, and makes per-environment overrides clean.

Directory Structure

infrastructure/
├── modules/                    # Reusable Terraform modules (no root module)
│   ├── eks-cluster/
│   ├── rds-aurora/
│   ├── ecr-repository/
│   └── vpc/
├── live/                       # Terragrunt root — one dir per env/account
│   ├── terragrunt.hcl          # Root config: backend, provider, common inputs
│   ├── dev/
│   │   ├── account.hcl         # Dev account ID, region
│   │   ├── eks/
│   │   │   └── terragrunt.hcl
│   │   └── rds/
│   │       └── terragrunt.hcl
│   ├── staging/
│   │   ├── account.hcl
│   │   ├── eks/
│   │   │   └── terragrunt.hcl
│   │   └── rds/
│   │       └── terragrunt.hcl
│   └── prod/
│       ├── account.hcl
│       ├── eks/
│       │   └── terragrunt.hcl
│       └── rds/
│           └── terragrunt.hcl

The Root terragrunt.hcl

This is the single source of truth for backend configuration. Every child config inherits it — no more copy-pasting S3 bucket names across 20 files.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# live/terragrunt.hcl

locals {
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  account_id   = local.account_vars.locals.account_id
  aws_region   = local.account_vars.locals.aws_region
  environment  = local.account_vars.locals.environment
}

# Generate backend config dynamically per environment
generate "backend" {
  path      = "backend.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
terraform {
  backend "s3" {
    bucket         = "my-org-tfstate-${local.account_id}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "${local.aws_region}"
    encrypt        = true
    dynamodb_table = "terraform-lock"
    role_arn       = "arn:aws:iam::${local.account_id}:role/TerraformStateRole"
  }
}
EOF
}

# Generate provider config with account-specific role assumption
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"
  contents  = <<EOF
provider "aws" {
  region = "${local.aws_region}"
  assume_role {
    role_arn = "arn:aws:iam::${local.account_id}:role/TerraformDeployRole"
  }
  default_tags {
    tags = {
      Environment = "${local.environment}"
      ManagedBy   = "terraform"
      Repository  = "my-org/infrastructure"
    }
  }
}
EOF
}

An Environment-Specific Module Config

Each environment’s component config just points at the shared module and overrides what’s different:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# live/prod/eks/terragrunt.hcl

locals {
  account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
  env          = local.account_vars.locals.environment
}

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules/eks-cluster"
}

inputs = {
  cluster_name    = "my-app-${local.env}"
  cluster_version = "1.29"

  node_groups = {
    general = {
      instance_types = ["m6i.xlarge"]
      min_size       = 3
      max_size       = 10
      desired_size   = 5
    }
    spot = {
      instance_types = ["m6i.xlarge", "m6a.xlarge", "m5.xlarge"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 20
      desired_size   = 5
    }
  }

  enable_cluster_autoscaler  = true
  enable_aws_load_balancer   = true
  enable_ebs_csi_driver      = true
}

The dev equivalent just changes instance types and sizing — the module and backend wiring are inherited automatically.

GitLab CI Integration

The pipeline uses OIDC to assume roles in each target account — no long-lived keys.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# .gitlab-ci.yml (relevant excerpt)

variables:
  TG_VERSION: "0.57.0"
  TF_VERSION: "1.8.0"

.aws_oidc: &aws_oidc
  id_tokens:
    GITLAB_OIDC_TOKEN:
      aud: https://gitlab.com
  before_script:
    - |
      export $(printf "AWS_ACCESS_KEY_ID=%s AWS_SECRET_ACCESS_KEY=%s AWS_SESSION_TOKEN=%s" \
        $(aws sts assume-role-with-web-identity \
          --role-arn "arn:aws:iam::${AWS_ACCOUNT_ID}:role/GitLabCIRole" \
          --role-session-name "gitlab-ci-${CI_JOB_ID}" \
          --web-identity-token "${GITLAB_OIDC_TOKEN}" \
          --query "Credentials.[AccessKeyId,SecretAccessKey,SessionToken]" \
          --output text))

plan:
  stage: plan
  image: alpine/terragrunt:${TF_VERSION}
  <<: *aws_oidc
  script:
    - cd live/${TARGET_ENV}/${TARGET_COMPONENT}
    - terragrunt plan -out=tfplan
  artifacts:
    paths: ["live/${TARGET_ENV}/${TARGET_COMPONENT}/tfplan"]
    expire_in: 1 hour

apply:
  stage: apply
  image: alpine/terragrunt:${TF_VERSION}
  <<: *aws_oidc
  script:
    - cd live/${TARGET_ENV}/${TARGET_COMPONENT}
    - terragrunt apply tfplan
  when: manual
  rules:
    - if: $CI_COMMIT_BRANCH == "main"

Key Patterns to Internalize

Never run terragrunt run-all apply in production without a plan review step. run-all is great for dev teardowns and rebuilds. In prod, always plan individual components, review the output, then apply. The blast radius of a mis-targeted run-all is significant.

Keep modules generic, keep inputs specific. Your eks-cluster module shouldn’t know what environment it’s in. Pass everything — cluster name, sizing, feature flags — as inputs. This keeps modules reusable across projects, not just environments.

Use dependency blocks over data sources for cross-component references. If your RDS module needs the VPC ID from your VPC module, reference it via Terragrunt’s dependency block rather than a data.terraform_remote_state lookup. It makes the dependency graph explicit and enables --terragrunt-parallelism to work correctly.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Referencing VPC outputs from another Terragrunt component
dependency "vpc" {
  config_path = "../vpc"
  mock_outputs = {
    vpc_id          = "vpc-00000000"
    private_subnets = ["subnet-00000000"]
  }
  mock_outputs_allowed_terraform_commands = ["plan", "validate"]
}

inputs = {
  vpc_id  = dependency.vpc.outputs.vpc_id
  subnets = dependency.vpc.outputs.private_subnets
}

The full module library for this pattern is something I’m working toward open-sourcing. More posts in this series will cover the VPC module design, EKS add-on management via Helm, and the RDS Aurora module with automated pg_repack scheduling.