Skip to main content

Understanding and Managing Terraform Drift: A Comprehensive Guide

In the world of Infrastructure as Code (IaC), Terraform has emerged as a powerful tool for managing and provisioning cloud resources. However, as with any system that manages real-world infrastructure, discrepancies can arise between the desired state (as defined in your Terraform code) and the actual state of your resources.

This phenomenon is known as “Terraform drift.” In this article, we’ll explore what Terraform drift is, why it occurs, and how to detect and manage it effectively.

Terraform drift

What is Terraform Drift?

Terraform drift occurs when the actual state of your infrastructure diverges from the state defined in your Terraform configuration. This can happen for various reasons, such as:

      • Manual changes made directly to resources outside of Terraform
      • External processes or scripts modifying resources
      • Changes made by other team members without updating the Terraform code
      • Automatic updates or modifications by the cloud provider

When drift occurs, it can lead to inconsistencies, unexpected behavior, and potential security risks in your infrastructure.

 

Flow Diagram

Why is Drift Detection Important?

Detecting and managing drift is crucial for several reasons:

  • Consistency: Ensures your infrastructure remains in sync with your IaC definitions
  • Security: Helps identify unauthorized or potentially harmful changes
  • Compliance: Aids in maintaining compliance with regulatory requirements
  • Troubleshooting: Makes it easier to diagnose and resolve issues in your infrastructure

How to Detect Terraform Drift

Terraform provides built-in commands to help detect drift:

Terraform Plan

The `terraform plan` command compares the current state with the desired state defined in your configuration files. It shows what changes would be made if you were to apply the current configuration.

terraform plan

If there’s drift, you’ll see output indicating which resources have changed and how they differ from the desired state.

Terraform Show

The `terraform show` command displays the current state of your infrastructure. You can use this to manually compare against your configuration files.

terraform show

Terraform Refresh

The `terraform refresh` command updates the state file to match the real-world infrastructure without making any changes.

terraform refresh

Managing Terraform Drift:

Once you’ve detected drift, there are several strategies to manage it:

1. Reconcile the Drift

          • Update your Terraform code to match the current state of your infrastructure
          • Use terraform import to bring existing resources under Terraform management

2. Enforce the Desired State

          • Run terraform apply to bring your infrastructure back in line with your configuration
          • Be cautious, as this may result in resource modifications or deletions

3. Implement Preventive Measures

          • Use version control for your Terraform code
          • Implement a proper CI/CD pipeline for infrastructure changes
          • Utilize Terraform workspaces for better isolation of environments

4. Regular Audits

          • Schedule regular drift detection checks
          • Automate drift detection and reporting

5. Ignore Specific Drifts

          • In some cases, it may be appropriate to ignore certain types of drift
          • This is especially useful when the drift is expected or doesn’t impact the overall functionality

Example: Manually updating an SSM parameter after creation

To ignore specific attributes, you can use the lifecycle meta-argument with ignore_changes in your Terraform configuration

Here’s an example of how to ignore changes to specific attributes:

resource "aws_ssm_parameter" "example" {
  name  = "/example/parameter"
  type  = "String"
  value = "initial_value"

  lifecycle {
    ignore_changes = [
      value,
    ]
  }
}

In this example, any changes made directly to the value of the SSM parameter outside of Terraform will be ignored during subsequent Terraform operations

When deciding whether to ignore drift, consider the following:

  • Is the drift expected as part of normal operations?
  • Does the drift impact the security or overall functionality of your infrastructure?
  • Is the drifted resource managed by another process or team?
  • Would reconciling the drift cause unnecessary disruption?

Remember, while ignoring drift can be useful in certain scenarios, it should be done thoughtfully and documented clearly to ensure all team members understand why certain changes are being ignored.

Detecting and Resolving Drift in a Cloud Environment:

Let’s consider a scenario where a team is managing a cloud-based web application using Terraform. They discover that one of their EC2 instances has been manually resized outside of Terraform.

Step 1: Detect the Drift

Running `terraform plan` reveals the discrepancy:

# aws_instance.web_server will be updated in-place
  ~ resource "aws_instance" "web_server" {
      ~ instance_type = "t2.micro" -> "t2.small"
        # (other attributes unchanged)
    }

Step 2: Investigate the Change

The team investigates and discovers that the instance was resized manually to handle increased traffic.

Step 3: Decide on a Course of Action

After discussion, the team decides to update their Terraform configuration to reflect this change, as the larger instance size is now required.

Step 4: Update Terraform Configuration

The team updates their `main.tf` file:

resource "aws_instance" "web_server" {
ami = "ami-0c55b159cbfafe1f0"
instance_type = "t2.small" # Updated from t2.micro
# ... other configuration ...
}

Step 5: Apply Changes and Resolve Drift

They run `terraform apply` to bring the Terraform state in line with both the actual infrastructure and the updated configuration.

Best Practices for Preventing Terraform Drift

To minimize the occurrence of drift, consider implementing these best practices:

  • Use Terraform for All Changes: Avoid making manual changes to resources outside of Terraform.
  • Implement Strong Access Controls: Limit who can make direct changes to your infrastructure.
  • Educate Your Team: Ensure all team members understand the importance of using Terraform for infrastructure changes.
  • Use Remote State: Store your Terraform state file remotely to ensure consistency across team members.
  • Implement State Locking: Prevent concurrent modifications that could lead to drift.
  • Regular State Refresh: Periodically refresh your Terraform state to catch any external changes.

Conclusion

Terraform drift is an inevitable challenge in managing infrastructure as code. By understanding what causes drift, how to detect it, and implementing best practices to manage and prevent it, you can ensure that your infrastructure remains consistent, secure, and compliant. Regular audits, strong access controls, and a commitment to using Terraform for all infrastructure changes will go a long way in minimizing drift and maintaining the integrity of your infrastructure.

Remember, the goal is not to eliminate drift entirely (which is nearly impossible) but to detect it quickly and have robust processes in place to manage it effectively when it occurs.

About The Author(s)

Moiz Zubair, a seasoned Software Systems and DevOps Engineer, also excels as an SRE/Platform Engineer. With a wealth of experience in creating robust systems and refining deployment processes, Moiz is dedicated to ensuring infrastructure reliability and efficiency. In his latest article, “What’s Terraform Drift and How to Handle It,” Moiz shares his insights on managing infrastructure drift, offering practical solutions drawn from his extensive expertise.

Related Articles

Related Articles