Checking for Changes in AWS Tags in Terraform Plans Using Open Policy Agent

Open Policy Agent is a policy engine that can be used as a unified way to enforce policies. It will take structured data, i.e. JSON here, and allows to make decisions based on that data.

This data could be Kubernetes manifests. See the OPA Gatekeeper for this. There is a talk (How Netflix Is Solving Authorization Across Their Cloud) by Netflix about how to use OPA for authorization, e.g. on REST endpoints. And as we will see later, it can also be used to evaluate Terraform plans.

This shows how OPA can be used in a wide range of use cases and as such can unify it. At least to some degree, as the structured data will be quite different in each use case. What will stay the same is the language used to describe the policies, which is called Rego.

Rego needs getting some used to but has certain language features that makes the task of evaluating rules easier. My approach here will be to set a target for a simple policy and use this to figure out how Rego works. As a result this certainly will not lead to the most optimal code :), so be warned!

I also will not go into detail about how to execute Rego scripts, as it is all well explained in the documentation of OPA.

Terraform Policy Check

One reason for me to evaluate OPA was to see if it could help to automatically check Terraform plans before an apply. Some things should get manually approved, but certain changes may get approved automatically. One simple example would be changes of tags of AWS resources.

A tutorial to create a policy for terraform already exists. It goes in a slightly different direction, therefore I think my journey to check for changes in tags may still be helpful.

Structure of a Terraform Plan

First, we need to understand the structure of a Terraform Plan when exported to JSON.

{
  "format_version": "1.1",
  "terraform_version": "1.3.5",
  ...,
  "resource_changes": [
    {
      ...
      "change": {
        "actions": [
          "update"
        ],
        "before": {
          "ami": "ami-1234567890",
          ...
          "tags": {
            "Name": "SomeName"
          },
          "tags_all": {
            "Name": "SomeName"
          },
        }
        "after": {
          "ami": "ami-1234567890",
          ...
          "tags": {
            "Name": "SomeNewName"
          },
          "tags_all": {
            "Name": "SomeNewName"
          },
        }
    },
    ...
  ]
}

Here we can see the general structure of the planned change, the change is update and one tag will be renamed. Everything else stays the same for now. The description of this change is split in two sections: before and after , which we will need to compare.

Check for updates of resource attributes

From what we learned before, we see that we need to compare two objects for different values. But the first step would be to pick up the resource_changes for resources containing the update field first.

This can easily be done using array comprehension.

updates := [changes |
  changes := input.resource_changes[_]
  changes.change.actions[_] == "update"
]

First, all entries of resource_changes are picked up and assigned to the variable changes to collect them in the array comprehension. The entry point for the supplied data is input. Then we go through all entries of change.actions and filter those objects which contain an update. The _ is used to select entries with all keys, without naming them. If the names are needed it is possible to add a variable like input.resource_changes[key], so that key can be referenced in other places. This can be seen in one of the next steps.

This looks quite reusable, as we not only have updates, so let us introduce a function:

actions(type) := [changes |
  changes := input.resource_changes[_]
  changes.change.actions[_] == type
]

This will extract a list of changes. In each, we need to compare before and after. Let us introduce another function, that lists the keys in which before and after differ for each index idx. Here we use a set comprehension instead of an array comprehension to have access to set operations later on. some is used to declare key as a local variable. Otherwise, it could clash with global definitions.

changed_arguments(idx) := {key |
  some key
  updates[idx].change.before[key] != updates[idx].change.after[key]
}

some key is used to create a placeholder to iterate through all available keys. These are filtered by the condition in the next line. Doing this all keys where the values differ between before and after are collected. This already shows one of the strengths of Rego. Usually, we would need to iterate over the respective values, in Rego we can do it declaratively.

The resulting output, when querying changed_arguments(0) will be:

[
  [
    [
      "tags",
      "tags_all"
    ]
  ]
]

As expected, we receive a list of keys.

Now we want to create a condition that checks that only the tags have changed. There may be a better way, but letting changed_arguments create a set, removing the tag-related entries, and counting the size felt simple enough for starting out.

only_tag_changes(idx) := count(changed_arguments(idx) - {"tags", "tags_all"}) == 0

Having set all of this up we now need to evaluate this for all indices of the list of changes, i.e. for all resources appearing in the plan. We do this by collecting a list of resources where more keys than just the tags are changed:

not_only_tag_changes_for_all_resources := [name |
    some idx
    name := updates[idx].address
    not only_tag_changes(idx)
]

Iterating over indices here seems to be a bit unnatural, as we need to figure out the number of indices via updates. It is not possible to use some idx for only_tag_changes(idx) here. I.e. they do not support automatic iteration. See Functions Versus Rules

Therefore let us try to use rules.

From functions to rules

The idea now is to not iterate over the indices anymore but use the name of the resource itself as a key. This basically maps names of resources to a set of attributes that will be updated.

changed_arguments[name] := keys {
 update := updates[_]
 name := update.address
 keys := {key |
  some key
  update.change.before[key] != update.change.after[key]
 }
}

The next step was to figure out in which resources only tags have been changed. The core of this rule is still similar to the function.

only_tag_changes[key] := cnt {
 some key
 changed_argument := changed_arguments[key]
 cnt := count(changed_argument - {"tags", "tags_all"}) == 0
}

Let us check the output of this, as we did before. Now this shows the result for all resources.

[
  [
    {
      "aws_instance.test1": true,
      "aws_instance.test2": false
    }
  ]
]

And we still want to just have a list of resources where more changes happen than just changes of the tags.

not_only_tag_changes_collected_for_all_resources := [name |
 some name
 only_tag_changed := only_tag_changes[name]
 only_tag_changed == false
]

Collecting the remaining types of changes

But before we can do the final check, we need to make sure we do not create or delete anything. To be safe let us create a function that picks up the complement of the action set. If we do this we can see if changes that do not fulfill update, read, or no-op are ignored. See terraform plan change representation.

complement_actions(types) := [changes |
    changes := input.resource_changes[_]

    # Convert the list of actions to a set to use the difference operator
    actions := {action_value |
        action_value := changes.change.actions[_]
    }
    
    # Check if actions remain. These are the complement actions.
    count(actions - types) != 0
]

Using this we define

non_updates := complement_actions({"update", "read", "no-op"})

which just lists all actions that create or destroy resources. Defining it as a complement of safe options feels safer. It is better having to approve once more than to accidentally delete a resource, just because some name changed.

Now we can tie this all together into a condition that checks if all updates only contain tag changes and if there are now creations or deletions.

default allow := false
allow {
    count(not_only_tag_changes_for_all_resources) == 0
    count(non_updates) == 0
}

Rejection Reasons

This is all quite helpful already. But we just get the response true or false. It would be quite helpful to know which changes led to the decision. One possibility is to use rules that contain the reason for denying a plan. The second line of each rule contains the condition for denying, in the following lines the reason message is created. Whenever one condition is met an entry with the respective reason is created.

deny[reason] {
    count(not_only_tag_changes_for_all_resources) != 0
    reason := sprintf("For the following resources not just tags will be updated: %s", [concat(", ", not_only_tag_changes_for_all_resources)])

}

deny[reason] {
    count(non_updates) != 0
    names := [name |
        name := non_updates[_].address
    ]

    reason := sprintf("For the following resources will be created or deleted: %s", [concat(", ", names)])
}

default allow := false
allow {
    count(deny) == 0
}

A possible response is

[
  [
    [
      "For the following resources not just tags will be updated: aws_instance.test1",
      "For the following resources will be created or deleted: aws_instance.test2"
    ]
  ]
]

if additional values in aws_instance.test1 are planned to be changed. For aws_instance.test2 a create or delete is planned.

Querying allow will result in a false, as expected:

[
  [
    false
  ]
]

Resulting Rego Script

package terraform.tags

actions(type) := [changes |
	changes := input.resource_changes[_]
	changes.change.actions[_] == type
]

complement_actions(types) := [changes |
	changes := input.resource_changes[_]
	actions := {action_value |
		action_value := changes.change.actions[_]
	}
	count(actions - types) != 0
]

updates := actions("update")

non_updates := complement_actions({"update", "read", "no-op"})

changed_arguments[name] := keys {
	update := updates[_]
	name := update.address
	keys := {key |
		some key
		update.change.before[key] != update.change.after[key]
	}
}

only_tag_changes[key] := cnt {
	some key
	changed_argument := changed_arguments[key]

	changed_argument_set := {arg |
		arg := changed_arguments[_]
	}
	cnt := count(changed_argument_set - {"tags", "tags_all"}) == 0
}

not_only_tag_changes_collected_for_all_resources := [name |
	some name
	only_tag_changed := only_tag_changes[name]
	only_tag_changed == false
]

deny[msg] {
	count(not_only_tag_changes_collected_for_all_resources) != 0
	msg := sprintf("For the following resources not just tags will be updated: %s", [concat(", ", not_only_tag_changes_collected_for_all_resources)])
}

deny[msg] {
	count(non_updates) != 0
	names := [name |
		name := non_updates[_].address
	]
	msg := sprintf("For the following resources will be created or deleted: %s", [concat(", ", names)])
}

default allow := false

allow {
	count(deny) == 0
}