From Simple to Sophisticated: Evolving Terraform Infrastructure for Azure with CI/CD and Governance
In this comprehensive guide, Hidde de Smet documents the step-by-step evolution of Terraform infrastructure for Azure. The post provides real-world insights and actionable patterns for teams modernizing their infrastructure-as-code, from basic setup to advanced automation and governance.
From Simple to Sophisticated: Terraform Infrastructure Evolution
By Hidde de Smet
This post documents a practical journey in evolving Terraform-based Azure infrastructure from basic, single-file deployments to a modular, automated, and governed state. The guide emphasizes not just technical steps, but also the rationale—enabling teams to improve their own practices incrementally.
Table of Contents
- The Starting Point: Simple but Limited
- Evolution Phase 1: Breaking Down the Monolith
- Evolution Phase 2: Standardization and Governance
- Naming Conventions and Environment Separation
- Validation and Comprehensive Tagging
- Workspace vs. Environment Separation Strategies
- Evolution Phase 3: Automation and CI/CD
- Evolution Phase 4: Comprehensive Testing
- The Advanced Features: Beyond the Basics
- Policy as Code
- Cost Management Foundation
- Infrastructure Monitoring Tools
- Alternative Approaches
- Key Lessons Learned
- What’s Next: The Roadmap Ahead
- Getting Started: Your Evolution Path
- Conclusion: Evolution Over Revolution
The Starting Point: Simple but Limited
Version 0.1.0 - The Basic Foundation
- Monolithic
main.tf
containing all Azure resources:- Resource Group
- Virtual Network and Subnet
- Network Security Group
- Storage Account and Container
- App Service Plan and Linux Web App
- Key Vault
- Pain Points:
- One massive file, no modularity or reusability
- Manual, error-prone deployment
- No standardized naming or documentation
- Scaling and collaboration are difficult
Evolution Phase 1: Breaking Down the Monolith
Version 0.2.0 – Modular Architecture
- Split resources into logical, reusable modules:
modules/network
(VNet, Subnet, NSG)modules/storage
(Storage Account, Containers)modules/webapp
(App Service Plan & Web App)modules/keyvault
(Key Vault)
Key Improvements:
- Reusability across environments
- Maintainability (easier debugging & updates)
- Collaboration (parallel work)
- Module-level testing
Lesson: Start with logical module boundaries, even for small projects. Modularization saves refactoring effort as needs grow.
Evolution Phase 2: Standardization and Governance
Version 0.3.0 – Naming Conventions and Environment Separation
- Added a naming module for consistent Azure resource naming, aligned with Azure CAF abbreviations
- Introduced dedicated
dev.tfvars
&prod.tfvars
for environments - Used Terraform workspaces for simple state separation
Example naming module logic:
locals {
resource_type_abbreviations = { resource_group = "rg", ... }
resource_group_name = var.resource_group != "" ? var.resource_group : "${var.prefix}-${local.resource_type_abbreviations.resource_group}-${var.environment}-${var.suffix}"
}
Version 0.4.0 – Validation and Comprehensive Tagging
- Validation module: Ensured names meet Azure constraints (length, allowed characters, variations per resource type)
- Tagging module: Standardized tags (
Environment
,Owner
,Cost Center
, etc.), automated metadata tracking (created date, Terraform version)
Impact:
- Moved from ad-hoc deployments to auditable, compliant infrastructure
- Enabled automation, cost tracking, and troubleshooting
Workspace vs Environment Separation
- Terraform Workspaces: Shared state, easy setup, but increased risk and limited environment isolation
- Separate Directories (Recommended):
- Full state isolation
- Siloed configs for CI/CD and security
- Slight code duplication, but safer for enterprise
Lesson: Start with workspaces for simplicity, migrate to separate directories/backends as needs grow.
Evolution Phase 3: Automation and CI/CD
Version 0.5.0 – GitHub Actions Integration
- Implemented CI/CD with environment-specific protection rules
- Branch-based strategy:
develop
branch → deploys to developmentmain
branch → deploys to production- Feature branches → PR validations only
- Integrated GitHub environment protection & Azure Service Principal auth
- Automated plan & apply, manual dispatch for emergencies
Workflow Sample:
name: 'Terraform Deploy'
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
workflow_dispatch:
...
jobs:
terraform-check:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: $
...
Results:
- Manual deployment time cut from 20+ minutes to seconds
- Reduced errors, improved approval/audit process
Evolution Phase 4: Comprehensive Testing
Version 0.6.0 – Terratest Implementation
- Developed a test suite using Terratest:
- Validation tests (syntax/config)
- Module tests (logic in isolation)
- Infrastructure tests (end-to-end, takes longer)
- Naming convention tests
- Created Makefile targets for standardized workflows:
make test # Quick validation
make test-all # Full suite
make test-modules # Test individual modules
make test-infrastructure # Full deployment tests
Example:
func TestNamingConventions(t *testing.T) {
terraformOptions := &terraform.Options{ ... }
terraform.InitAndApply(t, terraformOptions)
resourceGroupName := terraform.Output(t, terraformOptions, "resource_group_name")
assert.Contains(t, resourceGroupName, "test-rg-dev-001")
...
}
Impact:
- Early issue detection (e.g., Azure storage container naming compliance)
- Production risks caught before infra deployment
The Advanced Features: Beyond the Basics
Current State: Enterprise-Ready Infrastructure
-
Policy as Code: OPA (Open Policy Agent) for security/tagging enforcement, validated via Python script or direct OPA/Checkov integration
deny[msg] { ... }
- Cost Management Foundation: Infracost integration for cost estimation, reporting, and optimization tracking
- Monitoring Tools:
- Drift detection comparing state vs. actual
- Automated notifications & markdown reporting
Alternative Approaches
- Use native Terraform validation (
terraform plan
,terraform validate
) - Checkov for security scanning as a single binary/tool
- Infracost CLI or Azure CLI for cost estimation
- Shell scripts for drift detection (
terraform plan -detailed-exitcode
) - All tools orchestrated via GitHub Actions or Makefile
Documentation
- Architecture Decision Records (ADR) for design decisions
- Automated Terraform diagrams & comprehensive module docs
Key Lessons Learned
- Start simple, evolve systematically:
- Focus on stepwise maturity, not big-bang redesigns
- Learn at each stage and keep infra operational
- Governance is not optional:
- Enforce naming, tags, validation from early stages
- Enables tracking, security, and simplification
- Test your infrastructure code:
- Find and fix issues before production
- Ensures confidence and repeatability
- Automate early and often:
- CI/CD saves time and reduces errors
- Consistent, secure, auditable deployments
- Comprehensive documentation:
- Accelerates onboarding and teamwork
- Preserves decisions and upgrades rationale
What’s Next: The Roadmap Ahead
Short-term Goals:
- Automate policy validation and drift detection in CI
- Integrate Infracost cost estimation in PR workflows
- Improve monitoring and alerting
Long-term Vision:
- Fully automated policy enforcement
- Real-time optimization and alerts
- Self-healing infra with drift remediation
- Advanced security scanning (Checkov, tfsec)
Getting Started: Your Evolution Path
- Phase 1: Basic functionality, plan for modules
- Phase 2: Naming conventions and basic governance
- Phase 3: Validation and comprehensive tagging
- Phase 4: CI/CD automation with branch controls
- Phase 5: Full testing with Terratest
- Phase 6: Advanced features—policies, cost monitoring, drift detection
Conclusion: Evolution Over Revolution
A systematic, phased migration—from a simple script to a robust platform—enabled:
- Maintaining infrastructure during transformation
- Gradual team skill growth
- Value delivery at every step
- A future-proof foundation for continued maturity
Author: Hidde de Smet, Azure Solution Architect, specializing in cloud design and management using Scrum and DevOps methodologies.
This post appeared first on “Hidde de Smet’s Blog”. Read the entire article here