IAM role anywhere

IAM Roles for any workload

Background

A few month back a client of mine wanted to use GitLab pipeline to deploy infrastructure on AWS with Terraform. The key question is how to authenticate the Terraform process running in the pipeline to AWS with temporary credential. Having worked it out on GitHub, my proposal at time was to add OIDC provider to represent the GitLab runner.

After a few months, they told me that they are self-hosting their GitLab instance. The idea above was based on exposing an identity provider document on the public Internet, which the client is unable to do.

Now, I have an idea: IAM Roles Anywhere.

IAM Role Introduction

Many 101 tutorials asks beginners to create standalone IAM users (or group) with IAM policies directly attached. For programatic access they also include creating a pair of access key and secret access key and pass them along to an external application. The keys are long term credentials, and worse, never expires. The leakage of these long-term credentials had been such a headache that AWS strongly discourage the use of long term credentials. You can feel the discouragement when trying to create an access key through the web console, or by the banners on top of the documentation page about how to do so.

The recommendation is use temporary security credentials. In the context of AWS that means IAM roles. The users must assume an IAM role by issuing an API call, and the Security Token Service (STS) grants temporary credential in response.

CLI
CLI
Request: AssumeRole*
Request: AssumeRole*
Response:
– AccessKeyId
– SecretAccessKey
– SessionToken
Response:…
Text is not SVG – cannot display

This diagram has several variations. For example, the request can be AssumeRole, AssumeRoleWithSAML and AssumeRoleWithWebIdentity, depending on whether and how the user info is federated with external identity store. The returned response, a triplet of three values, makes the temporary credential that we should use in any secure environment. They must be renewed before expiry. This model works not only for human identity (e.g. SAML integration, OIDC integration, cross-account access) but also for workload identity (e.g. EC2 instance profile, Lambda execution role, ECS task role, etc).

Another good example is IAM Role for Service Account (IRSA), where a web identity represents a Kuberentes Service Account to gain role credential using the AssumeRoleWithWebIdentity API. In this post however, I’d like to explore more about the IAM role for EC2 instance profile.

The IMDS service

For EC2 instance, we all know that we can associate an IAM role as the instance profile and grant the process using AWS SDK running on the instance with permissions associated with the IAM role. At a lower level, this relies on the IMDS (Instance Metadata Service) running on the instance.

If an EC2 instance’s profile points to an IAM role, a process running on the instance using AWS SDK will also need to get the triplet from the STS. It is summarized in this diagram:

IMDS v2
169.254.169.254
IMDS v2…
AWS SDK
AWS SDK
Request: AssumeRole
Request: AssumeRole
Response:
– AccessKeyId
– SecretAccessKey
– SessionToken
Response:…
Application
Process
Application…
EC2 Instance
EC2 I…
Text is not SVG – cannot display

The IMDS is a service available on one of the link-local IP address (169.254.169.254) on the EC2 instance. Requests made to this IP address are not routed elsewhere. The Instance Metadata Service (IMDS) is a means for the cloud service provider’s virtualization layer to share information with the processes on the operating system of a virtual machine. It responds with information related to the instance itself, such as the subnets, IAM role, instance ID, AMI ID, security group. The instance metadata also includes user data script for cloud init process to consume, and most relevantly, the role credential for the instance. This also requires that the IMDS service to have connectivity to the STS endpoint, either via interface endpoint or over the Internet. The AssumeRole calls are logged in CloudTrail.

All major cloud vendors (AWS, Azure and GCP) uses the IMDS mechanism, and this mechanism obviously draws the attention of bad actors. I find some good articles on this here and here.

IAM Role for EC2 Workload

For EC2 instances at AWS, the initial IMDS v1 was introduced in 2012 and allows a GET method to fetch instance metadata. The IMDS v1 is subject to attacks such as SSRF (Server-side request forgery). In 2019 AWS introduced IMDS v2 which tackles those vulnerabilities. As of date, the recommendation is to use IMDSv2. Here is an example of how to fetch instance metadata, including the credential:

# Grab a token
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"`
# Get top-level instance metadata information
curl http://169.254.169.254/latest/meta-data/profile -H "X-aws-ec2-metadata-token: $TOKEN"
# Get the name of the role
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials
# Get the credential for the role session
curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/InstanceProfileRoleName

These commands emulate how the SDK library fetches the credentials to assume the instance profile role. You can also find similar commands on the documentation. However, there isn’t much details about how the instance metadata service interacts with the STS service, except a general statement:

These security credentials are temporary and we rotate them automatically. We make new credentials available at least five minutes before the expiration of the old credentials.

When creating a new EC2 instance, make sure that the instance metadata option has http_endpoint enabled, to enable the IMDS service. Also set http_tokens to required, which would run IMDSv2 exclusively. With that setup, the application does not have to mana to use an SDK version that supports IMDSv2. Another metadata option is http_put_response_hop_limit, with default of 1. This limits the number of hops in the metadata request. If the process runs from a Docker container with bridge networking mode, set it to 2 or the process cannot even secure a token.

IMDS v2
169.254.169.254
IMDS v2…
AWS SDK
AWS SDK
Request: AssumeRole
Request: AssumeRole
Response:
– AccessKeyId
– SecretAccessKey
– SessionToken
Response:…
Application
Process
Application…
EC2 Instance
EC2 I…
Bridge
Network
Bridge…
Docker Daemon
Docker Daemon
Text is not SVG – cannot display

The diagram above illustrates this scenario with two hops.

IAM Role Anywhere Concept

As this point, we know that the process of workload assuming an IAM role, is essentially using SDK to gain role credentials from instance metadata. In addition to using SDK and instance metadata, AWS also supports using X.509 certificate to gain role credentials. As a result, workload no longer needs AWS SDK, and it doesn’t rely on instance metadata from an EC2 instance. This mechanism is known as IAM Role Anywhere, and it greatly expands the use cases for IAM Role.

To make this work, we first have to provide a certificate authority (CA) to AWS as a trust anchor. It can be any X.509 CA including AWS Private CA. The IAM Roles Anywhere will allow any end-entity endorsed by this trust anchor, to assume an IAM role as specified. We also need to create a profile, in which we can add IAM policies directly, or link to an IAM roles with a trust policy for service principal rolesanywhere.amazonaws.com.

To gain role credential, the requestor must provide both the private key, and its end-entity certificate. The certificate proofs the endorsement of the CA as Role’s trust anchor specifies. The private key proofs the requestor’s identity. The requestor uses the aws_signing_helper utility to request role credentials.

The utility is compatible with the credential_process feature in AWS config, which passes the returned role credentials to the AWS config profile for AWS CLI or SDK running on external virtual machine.

IAM Role Anywhere Lab

Let’s tweak the three commands from this old post of mine to create the test materials: a self-signed CA and a certificate signed by the CA:

openssl req -x509 -sha256 -newkey rsa:4096 -keyout ca.key -out ca.crt -days 356 -nodes -subj '/CN=Health Certificate Authority' -addext basicConstraints=critical,CA:TRUE,pathlen:1 -addext keyUsage=keyCertSign

cat > ext.cnf <<EOF
[v3_leaf]
keyUsage = digitalSignature
basicConstraints=CA:false
EOF

openssl req -new -newkey rsa:4096 -keyout server.key -out server.csr -nodes -subj '/CN=*.digihunch.com'

openssl x509 -req -sha256 -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt -extfile ext.cnf -extensions v3_leaf

I tweak them to add the X.509 extensions to meet the requirement for signature validation. We need the following files from the output.

  • ca.crt -> the certificate of the CA. We provide this file as the trust anchor
  • server.crt -> the certificate of the server, we need it as the end-entity certificate
  • server.key -> we need to present this file to proof identity of the requestor

First, we go to AWS console and create a new trust anchor. Copy the content of ca.crt as the certificate. Then we can create a profile with an IAM role, with the trust policy looking like this example. Then we can request the role credential with one command:

aws_signing_helper credential-process --certificate server.crt --private-key server.key --trust-anchor-arn $TRUST_ANCHOR_ARN --profile-arn $PROFILE_ARN --role-arn $ROLE_ARN

Moreover, if the workload supports AWS SDK or can use CLI but not an EC2 instance, we can bake this in the AWS profile on the external machine:

[profile myprofile]
output = json
credential_process = aws_signing_helper credential-process --certificate /path/server.crt --private-key /path/server.key --trust-anchor-arn $TRUST_ANCHOR_ARN --profile-arn $PROFILE_ARN --role-arn $ROLE_ARN

We can even configure this in any pipeline as code to allow deployment from a non-AWS pipeline.

Summary

In summary, apart from native AWS services, an IAM role can trust the following types of principals:

  • Native IAM identity such as an IAM user or an IAM group
  • Authenticated identity from SAML identity provider that IAM is configured to trust
  • Authenticated identity from OIDC identity provider that IAM is configured to trust
  • Validated identity endorsed by a Certificate Authority that IAM designate as a trust anchor

The first type is rarely used because few organizations uses AWS IAM as their identity store. Most organizations have their identity store with federation capability via SAML. On the other hand, a lot of modern applications adopts identity stores with OIDC compliance. Now with IAM Role Anywhere, any entity with X.509 identity can also assume an IAM role. It works with any CI/CD pipeline, whether it is self-hosted. Also, it is now more important to keep the keys safe.