Debating between count and for_each in Terraform

In Terraform, we often have to create an array of resources of the same type but similar attribute values. For code reusability, manageability and for DRY principle, it’s better to use loop. Terraform HCL supports loop via the use of meta-argument. Currently, there are two options to drive a loop: count and for_each .

Problem with `count` loop

The book Terraform Up and Running (Chapter 5 Terraform Tips and Tricks) regards count as Terraform’s oldest, simplest and most limited iteration construct. One of the big limitations is the shifting of index if the length of resource array changes. The point comes with a good example:

variable "user_names" {
  description = "Create IAM users with these names" 
  type = list(string)
  default = ["neo", "trinity", "morpheus"]
}
# The Example from the book Terraform Up and Running
resource "aws_iam_user" "example" { 
  count = length(var.user_names)
  name = var.user_names[count.index]
}

As you execute Terraform apply, three IAM users will be created, with the plan looking like:

# aws_iam_user.example[0] will be created
+ resource "aws_iam_user" "example" {
    + name = "neo"
    (...) 
  }
# aws_iam_user.example[1] will be created
+ resource "aws_iam_user" "example" {
    + name = "trinity"
    (...) 
}
# aws_iam_user.example[2] will be created
+ resource "aws_iam_user" "example" {
    + name = "morpheus"
    (...) 
}

Then if you remove “trinity” from the variable user_names, and run terraform plan, the plan would look like:

Terraform will perform the following actions:
      # aws_iam_user.example[1] will be updated in-place
      ~ resource "aws_iam_user" "example" {
            id            = "trinity"
          ~ name          = "trinity" -> "morpheus"
        }
      # aws_iam_user.example[2] will be destroyed
      - resource "aws_iam_user" "example" {
          - id            = "morpheus" -> null
          - name          = "morpheus" -> null
        }
Plan: 0 to add, 1 to change, 1 to destroy.

In this plan, instead of deleting the second user, it renames the second user and deletes the third user. While the plan matches the code logic, it is often an unwanted result, considering the resource could be one that many other resources depends on, such as a subnet.

This is a good example of the problem with count. Terraform identifies each resource in the generated list of resource by position(index) . When the length changes, the index shifts. If you remove an item from the middle of the list, Terraform will delete every resource after the deleted item, then re-create all the resources that come after the deleted one. As a consequence, you may loose availability or even worse, lose data.

Embrace `for_each` loop

If we modify the example above to use for_each, the code looks like:

variable "user_names" {
  description = "Create IAM users with these names" 
  type = list(string)
  default = ["neo", "trinity", "morpheus"]
}
# The Example from the book Terraform Up and Running
resource "aws_iam_user" "example" 
{ 
  for_each = toset(var.user_names) 
  name = each.value
}

This results in the creation of three IAM users. If you remove the “trinity” user from the middle of the input collection and apply, the plan looks like this:

Terraform will perform the following actions:
      # aws_iam_user.example["trinity"] will be destroyed
      - resource "aws_iam_user" "example" {
          - arn           = "arn:aws:iam::123456789012:user/trinity" -> null
          - name          = "trinity" -> null
        }
Plan: 0 to add, 0 to change, 1 to destroy.

The plan suggests that Terraform will delete the very resource that was taken out from the middle of the input collection and no existing resources in the array are impacted.

Note that in the code snippet above, we use function toset() to convert the input list to a set (ordered and de-duped list of string). This is because we can only loop over a set or map when creating an array of resource. If the array of resource being created have another attribute whose value needs to be individualized, we can loop over a map and store the individualized attribute values as key-value pairs.

A few pages down, the book discusses an important limitation for both count and for_each. The length of the resource array that you are creating with count or for_each meta-argument must not be computed from other resources. Terraform must be able to compute count and for_each during the plan phase, before any resources are created or modified. The length of the resource array can be from hardcoded values, data sources, or even a list of other resources to create in the same file, so long as the length can be determined during the plan, instead of not being computed from other resource outputs.

A real-life example with classic pattern

The book then touches on another advantage of for_each: the ability to create multiple inline blocks within a resource. The guide from Hashicorp documentation also has a section on when to use for_each Instead of count, with a similar example. The section merely mentions when to use count in the opening sentence: If your instances are almost identical, count is appropriate.

That makes for_each sound like a no-brainer, after reading all the literatures about this topic. In my experience with a specific use case at the beginning, count feels more efficient. The example from the book is too simplistic. To better compare the two options, I need a realistic example. Let’s consider this use case where, after creating a VPC, I need to create the followings:

one NAT gateway for each availability zone (each NAT Gateway maps to one subnet and one allocation ID)
one public subnet for each availability zone
one public IP allocation in each availability zone

We can summarize the relationships between resources in the following diagram:

I deliberately pick this example because they are self-contained. So are all code snippets in this post. The example also demonstrate a classic relation between resources that we can find everywhere in infrastructure automation. Here’s another example off the bat:

create an array of aws_subnet, each has a subnet_id attribute;
create an array of aws_route_table, each has a reout_table_id attribute;
now, create an array of aws_route_table_association, each referencing one aws_subnet (by subnet_id) and one aws_route (by route_table_id);

If we address the NAT gateway example, we’re good with many other resources that shares the same relation pattern. In the next section, we’ll first implement the NAT gateway example, using count loop.

Implementation using `count`

The use case exemplifies the pattern where we have multiple types of resources related to each other. We need a loop in each type of resources, resulting in multiple arrays of different resource types. Moreover, the elements in the array for aws_nat_gateway has 1-to-1 mappings with both the array for aws_subnet, and the array for aws_eip.

With count, I created Terraform code with everything in a single main.tf file for the convenience of illustration, like this:

provider "aws" {}

variable "vpc_cidr_block" {
  type    = string
  default = "147.206.0.0/16"
}
variable "public_subnets_cidr_list" {
  type    = list(any)
  default = ["147.206.0.0/22", "147.206.4.0/22"]
  #default = ["147.206.0.0/22", "147.206.4.0/22", "147.206.8.0/22"]
}

data "aws_availability_zones" "this" {}

resource "aws_vpc" "base_vpc" {
  cidr_block = var.vpc_cidr_block
}

resource "aws_internet_gateway" "internet_gw" {
  vpc_id = aws_vpc.base_vpc.id
}

resource "aws_subnet" "public_subnets" {
  count                   = length(var.public_subnets_cidr_list)
  vpc_id                  = aws_vpc.base_vpc.id
  cidr_block              = var.public_subnets_cidr_list[count.index]
  map_public_ip_on_launch = true
  availability_zone       = data.aws_availability_zones.this.names[count.index]
}

resource "aws_eip" "nat_eips" {
  count = length(var.public_subnets_cidr_list)
}

resource "aws_nat_gateway" "nat_gws" {
  count         = length(var.public_subnets_cidr_list)
  subnet_id     = aws_subnet.public_subnets[count.index].id
  allocation_id = aws_eip.nat_eips[count.index].id
  depends_on    = [aws_internet_gateway.internet_gw]
}

The intent is to create subnet, public IP and NAT gateway for two availability zones. I also want to add one more AZ in the future and have the code to handle the addition gracefully. To add the new AZ, I uncomment line 10 and comment out line 9. The plan after this code change looks like this:

Terraform will perform the following actions:

  # aws_eip.nat_eips[2] will be created
  + resource "aws_eip" "nat_eips" {
      (...) 
    }

  # aws_subnet.public_subnets[2] will be created
  + resource "aws_subnet" "public_subnets" {
      + availability_zone                              = "us-east-1c"
      + cidr_block                                     = "147.206.8.0/22"
      + id                                             = (known after apply)
      (...) 
    }

  # aws_nat_gateway.nat_gws[2] will be created
  + resource "aws_nat_gateway" "nat_gws" {
      + allocation_id                      = (known after apply)
      + subnet_id                          = (known after apply)
      (...) 
    }

Plan: 3 to add, 0 to change, 0 to destroy.

The plan creates a set of resources required for the new availability zone without touching any existing resource, which is expected.

Why do I only focus on the use case of adding a new subnet in new AZ, and not deleting or modifying CIDR on an existing subnet? That’s because we rarely do that with production. We rarely remove the use of an availability zone. Nor do we modify the CIDRs on an existing subnet. In fact, AWS SDK does not even have an API to change CIDRs on a subnet or a VPC. In our infrastructure operation, we make such decisions upfront so they are immutable once provisioned. We simply don’t need to consider all the possible CRUD actions on a resource.

So, the count loop does just the job. Now, what about for_each?

Implementation with `for_each`: first attempt

Since for_each takes a set or map, I have to make some adjustment. My first attempt looks like this:

provider "aws" {}

variable "vpc_cidr_block" {
  type    = string
  default = "147.206.0.0/16"
}

variable "public_subnets_cidr_list" {
  type    = list(any)
  default = ["147.206.0.0/22", "147.206.4.0/22"] # 2 AZ
  #default = ["147.206.0.0/22", "147.206.4.0/22", "147.206.8.0/22"] # 3 AZ
}

data "aws_availability_zones" "this" {}

resource "aws_vpc" "base_vpc" {
  cidr_block = var.vpc_cidr_block
}

resource "aws_internet_gateway" "internet_gw" {
  vpc_id = aws_vpc.base_vpc.id
}

locals {
  subnet_config = [
    for i in range(length(var.public_subnets_cidr_list)) : {
      cidr = var.public_subnets_cidr_list[i]
      az   = data.aws_availability_zones.this.names[i]
    }
  ]
}

resource "aws_subnet" "public_subnets" {
  for_each                = { for idx, rec in local.subnet_config : idx => rec }
  vpc_id                  = aws_vpc.base_vpc.id
  cidr_block              = each.value.cidr
  map_public_ip_on_launch = true
  availability_zone       = each.value.az
  tags                    = { Name = "PUBLIC-SUBNET" }
}

resource "aws_eip" "nat_eips" {
  for_each = toset(var.public_subnets_cidr_list)
  tags     = { Name = "NATEIP" }
}

data "aws_subnets" "public_subnets" {
  filter {
    name   = "tag:Name"
    values = ["PUBLIC-SUBNET"]
  }
  depends_on = [aws_subnet.public_subnets]
}

data "aws_eips" "nat_eips" {
  filter {
    name   = "tag:Name"
    values = ["NATEIP"]
  }
  depends_on = [aws_eip.nat_eips]
}

locals {
  nat_gw_config = [
    for i in range(length(var.public_subnets_cidr_list)) : {
      subnet_id = data.aws_subnets.public_subnets.ids[i]
      alloc_id  = data.aws_eips.nat_eips.allocation_ids[i]
    }
  ]
}

resource "aws_nat_gateway" "nat_gws" {
  for_each      = { for idx, rec in local.nat_gw_config : idx => rec }
  subnet_id     = each.value.subnet_id
  allocation_id = each.value.alloc_id
  depends_on    = [aws_internet_gateway.internet_gw]
}

Note that I have to create a couple of data resources (nat_eips and public_subnets) and local variables (subnet_config and nat_gw_config) in order build the required map data structures and feed them to the for_each parameters.

After Terraform apply, let’s edit public_subnets_cidr_list with the additional subnet CIDR for the 3rd AZ. The plan looks like this:

Terraform will perform the following actions:

  # data.aws_eips.nat_eips will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_eips" "nat_eips" {
      + allocation_ids = (known after apply)
       (...) 
    }

  # data.aws_subnets.public_subnets will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_subnets" "public_subnets" {
      + id   = (known after apply)
      + ids  = (known after apply)
       (...) 
    }

  # aws_eip.nat_eips["147.206.8.0/22"] will be created
  + resource "aws_eip" "nat_eips" {
      + allocation_id        = (known after apply)
       (...) 
    }

  # aws_nat_gateway.nat_gws["0"] must be replaced
-/+ resource "aws_nat_gateway" "nat_gws" {
      ~ allocation_id                      = "eipalloc-0ffe32519d20b9b7f" # forces replacement -> (known after apply) # forces replacement
      ~ subnet_id                          = "subnet-0b4b202056c75bb0a" # forces replacement -> (known after apply) # forces replacement
      (...) 
        # (1 unchanged attribute hidden)
    }

  # aws_nat_gateway.nat_gws["1"] must be replaced
-/+ resource "aws_nat_gateway" "nat_gws" {
      ~ allocation_id                      = "eipalloc-0ff1d32fb271b5cf4" # forces replacement -> (known after apply) # forces replacement
      ~ subnet_id                          = "subnet-09b6486fbe44e9795" # forces replacement -> (known after apply) # forces replacement
       (...) 
        # (1 unchanged attribute hidden)
    }

  # aws_nat_gateway.nat_gws["2"] will be created
  + resource "aws_nat_gateway" "nat_gws" {
      + allocation_id                      = (known after apply)
      + subnet_id                          = (known after apply)
       (...) 
    }

  # aws_subnet.public_subnets["2"] will be created
  + resource "aws_subnet" "public_subnets" {
      + availability_zone                              = "us-east-1c"
      + cidr_block                                     = "147.206.8.0/22"
      (...) 
    }

Plan: 5 to add, 0 to change, 2 to destroy.

Wait a second, I expect the template to create a new subnet, a new elastic IP and a new NAT gateway in that new AZ. But why does it plan to delete the two existing NAT gateways and recreate two? This doesn’t make for_each an appealing option at all.

Apart from the interruptive plan, there are also other problems. First, since the aws_subnet resources requires cidr_block and availability_zone values, I have to build a map (subnet_config) for its resource array to consume. Similarly, I have to build a second map (nat_gw_config) to create resource array for aws_nat_gateway, which requires subnet_id and allocation_id. This map takes more work to build. Because of the 1-to-1 relationship between subnet_id and alloc_id, I have to fetch the values from two data sources (line 47-61), use a common index (line 63-70). Can I neat it up and combine two maps into one? Not really. Because the second map (nat_gw_config) uses a data source depending on the subnets, which depends on the first map (subnet_config). Trying to combine the maps causes circular dependency!

Also, the additions of data sources makes the code less readable. As Marcel L pointed out in his post, two cons with for_each are: complexity and requiring a map (to store multiple attribute values). Now we seem to have one more: it may cause unintended deletions

Is `for_each` a bad idea?

Let’s find out why for_each could destroy two existing NAT gateways.

Notice that I built the map nat_gw_config by looping through the list of variable public_subnets_cidr_list. After apply, we appended it one more string at the end, without changing the existing order. However, the devil lies in the order of the string lists returned from the data sources. By printing this map, we found that the originally value before the AZ addition is:

index	alloc_id	subnet_id
0	eipalloc-0ffe32519d20b9b7f	subnet-0b4b202056c75bb0a
1	eipalloc-0ff1d32fb271b5cf4	subnet-09b6486fbe44e9795

Based on this, NAT Gateway with index 0 is created with eipalloc-***b7f and subnet-***b0a. NAT Gateway with index 1 is created with eipalloc-***cf4 and subnet-***795. After we add the third AZ, and apply the run, the new map, with a new alloc_id and a new subnet_id looks like this:

index	alloc_id	subnet_id
0	eipalloc-0b70460721596e33f (new)	subnet-0b4b202056c75bb0a
1	eipalloc-0ffe32519d20b9b7f	subnet-0335071ced2dc9922 (new)
2	eipalloc-0ff1d32fb271b5cf4	subnet-09b6486fbe44e9795

There are two factors at play. When the data sources return the ids (data.aws_subnets.public_subnets.ids and data.aws_eips.nat_eips.allocation_ids), the return is sorted. It doesn’t matter whether the order alphabetical or the opposite. Because in any given order, the randomly generated new ID, can fall anywhere in the list. In this particular result, the new alloc_id falls at the beginning, and the new subnet_id falls in the middle. As a result, NAT Gateway with index 0 and 1 are both changed. Therefore they have to be destroyed and replaced.

All these come from having to build a map. The values of each object in the map come from two different data sources. The values are not predetermined and contain a random part. When we add more AZ, the entire map get shuffled, leading to deletion of existing resources. Yikes.

Implementation with `for_each`: second attempt

The first draft of this post drew some ideas on Reddit. One redditor pointed out that the snippet above with for_each isn’t the optimal way. With some tricks to we can manage the map so that it maintain relative order if we have to add new AZ. The strategy is:

Avoid using data sources to retrieve attribute values
Use a unique key to identify objects in the map;
Directly look up from the resource by the unique key

We’re able to do #2 and #3 because when a resource has the for_each argument set, the resource itself becomes a map of objects. We can then locate that resource by the key. We can determine what that key is so long as it uniquely identifies the resource. Below is the revised code snippet with for_each:

provider "aws" {}

variable "vpc_cidr_block" {
  type    = string
  default = "147.206.0.0/16"
}

variable "public_subnets_cidr_list" {
  type    = list(any)
  default = ["147.206.0.0/22", "147.206.4.0/22"] # 2 AZ
  #default = ["147.206.0.0/22", "147.206.4.0/22", "147.206.8.0/22"] # 3 AZ
}

data "aws_availability_zones" "this" {}

resource "aws_vpc" "base_vpc" {
  cidr_block = var.vpc_cidr_block
}

resource "aws_internet_gateway" "internet_gw" {
  vpc_id = aws_vpc.base_vpc.id
}

locals {
  subnet_config = {
    for cidr in var.public_subnets_cidr_list : md5(cidr) => {
      cidr = cidr
      az   = data.aws_availability_zones.this.names[index(var.public_subnets_cidr_list, cidr)]
    }
  }
}

resource "aws_subnet" "public_subnets" {
  for_each                = local.subnet_config
  vpc_id                  = aws_vpc.base_vpc.id
  cidr_block              = each.value.cidr
  map_public_ip_on_launch = true
  availability_zone       = each.value.az
}

resource "aws_eip" "nat_eips" {
  for_each = { for cidr in var.public_subnets_cidr_list : md5(cidr) => cidr }
}

locals {
  nat_gw_config = {
    for cidr in var.public_subnets_cidr_list : md5(cidr) => {
      subnet_id = aws_subnet.public_subnets[md5(cidr)].id
      alloc_id  = aws_eip.nat_eips[md5(cidr)].allocation_id
    }
  }
}

resource "aws_nat_gateway" "nat_gws" {
  for_each      = local.nat_gw_config
  subnet_id     = each.value.subnet_id
  allocation_id = each.value.alloc_id
  depends_on    = [aws_internet_gateway.internet_gw]
}

In this example, I use the MD5 hash of CIDR as the unique identifier key to ensure we have a consistent mapping between allocation id and subnet id. When a new AZ is created, the new allocation-subnet id pair will have its own new key. The unique key can be any identifier (even the CIDR itself) as long as it is unique and we do not change the selection of unique key after the first apply.

One more shot with `for_each`

The code snippet above got rid of data sources, but still have to leverage two local values (subnet_config and nat_gw_config) as helpers. Are they absolutely necessary?

Not really. The Terraform documentation has a page about References to Values, where it states:

If the resource has the count argument set, the reference’s value is a list of objects representing its instances.
If the resource has the for_each argument set, the reference’s value is a map of objects representing its instances.

In other words, using for_each with a map as input, we’re also creating a map as output, which is the resource array itself. The key is the same as the input map. Therefore, we can reuse the key. I know that sounds too abstract. Here’s the code refined:

provider "aws" {}

variable "vpc_cidr_block" {
  type    = string
  default = "147.206.0.0/16"
}

variable "public_subnets_cidr_list" {
  type = list(any)
  #default = ["147.206.0.0/22", "147.206.4.0/22"] # 2 AZ
  default = ["147.206.0.0/22", "147.206.4.0/22", "147.206.8.0/22"] # 3 AZ
}

data "aws_availability_zones" "this" {}

resource "aws_vpc" "base_vpc" {
  cidr_block = var.vpc_cidr_block
}

resource "aws_internet_gateway" "internet_gw" {
  vpc_id = aws_vpc.base_vpc.id
}

resource "aws_subnet" "public_subnets" {
  for_each = { for cidr in var.public_subnets_cidr_list : md5(cidr) => {
    cidr = cidr
    az   = data.aws_availability_zones.this.names[index(var.public_subnets_cidr_list, cidr)]
    }
  }
  vpc_id                  = aws_vpc.base_vpc.id
  cidr_block              = each.value.cidr
  map_public_ip_on_launch = true
  availability_zone       = each.value.az
}

resource "aws_eip" "nat_eips" {
  for_each = { for cidr in var.public_subnets_cidr_list : md5(cidr) => cidr }
}

resource "aws_nat_gateway" "nat_gws" {
  for_each      = aws_subnet.public_subnets
  subnet_id     = aws_subnet.public_subnets[each.key].id
  allocation_id = aws_eip.nat_eips[each.key].allocation_id
  depends_on    = [aws_internet_gateway.internet_gw]
}

Voila. I use md5 of the CIDR as the key again, first to create both aws_subnet and aws_eip. I also followed the example of chaining for_each between resource types. This way, when creating aws_nat_gateway, I can reference an instance in each resource array by the same key. Chaining for_each is very handy. But admittedly, it takes several iterations for me to get there. The code is neater, but not as straightforward to read due to the list/map comprehension.

Conclusion

I came across a team where the code review guideline favours for_each strongly. I see where that comes from after reading the book. But I don’t find count to be evil. That triggered my initiative to dive deep into this topic.

In this post we brought up a classic pattern of relationship between resources, and examined several ways to implement them using count and for_each. Using count can be straightforward but carries the risk of index shifting if additional element is added in the middle of the resource array. On the other hand, for_each is more powerful, but it requires some crafting with the Python-style list/map comprehension.

My recommendation is, start with a holistic look at the types of resources to create with loop, and how they are related with each other. Go with count if index shifting isn’t a risk. For example, when you need to create one instance of a resource conditionally. Otherwise, use for_each loop if the team is comfortable with the list/map comprehension. In some cases where we need to conditionally create several instances of the same resource, we can use a technique such as:

for_each = variable.disabled ? {} : data.any_resource.map
for_each = variable.disabled ? toset([]) : data.any_resource.list

In fact, the recommendation from AWS Terraform best practice is highly in favour of for_each.

Debating between count and for_each in Terraform

Problem with count loop

Embrace for_each loop

A real-life example with classic pattern

Implementation using count

Implementation with for_each: first attempt

Is for_each a bad idea?

Implementation with for_each: second attempt

One more shot with for_each