An infrastructure engineer left a company. Not dramatically — two weeks' notice, proper handover meetings, a shared document titled "Infrastructure Notes" that turned out to be six bullet points and a broken diagram link. The team assumed the cloud environment was documented somewhere. It wasn't. Over the following weeks, they discovered that most of the production environment — load balancers, security groups, IAM roles, database configurations, cron jobs, DNS records — had been configured manually through the AWS console. No Terraform. No CloudFormation. No code of any kind. Reproducing the environment for a new region took three months of reverse-engineering, and they still weren't confident they'd caught everything.
This is not an unusual story. It is the default outcome when infrastructure is built without an infrastructure as code standard. Every click in a cloud console is a decision that exists only in the memory of the person who made it. When that person leaves, takes a holiday, or simply forgets, the decision is gone. The infrastructure remains, but the reasoning — why this instance type, why this security rule, why this subnet configuration — disappears with them.
EB Pearls™ has delivered over 900 projects across software development and DevOps engagements since 2004. The pattern is consistent: teams that treat IaC as optional in year one treat it as an emergency in year two. Infrastructure you can't reproduce from code is infrastructure you can't audit, can't hand over, and can't trust.
Why Manually Configured Infrastructure Becomes a Liability
Manual infrastructure configuration doesn't fail on the day it's created. It fails on the day someone else needs to understand it. That delay — between creation and consequence — is what makes it so dangerous. Everything works fine until it doesn't, and by then the cost of fixing it has compounded.
Single-person dependency. When infrastructure is configured by hand, knowledge of that infrastructure lives in one person's head. Their departure, illness, or unavailability becomes an operational risk. The team can see what exists in the cloud console, but they can't see why it exists, what depends on it, or what would break if it changed. According to HashiCorp's State of Cloud Strategy Survey, a significant majority of organisations identify manual processes and lack of automation as a primary barrier to multi-cloud operations. The problem isn't a shortage of cloud skills — it's that manual processes create knowledge silos that no amount of documentation fully resolves.
Audit impossibility. When a security incident occurs, one of the first questions is: what changed, when, and who approved it? If infrastructure changes happen through a console, there is no review process, no approval chain, and no diff to inspect. Cloud provider audit logs capture that a change was made, but they don't capture the intent or the review that should have preceded it. For organisations pursuing ISO 27001, SOC 2, or similar certifications, this gap is a compliance failure. EB Pearls holds ISO 9001 and ISO 27001 certification, and the controls that satisfy those standards begin with infrastructure defined and reviewed in code.
Environment reproduction failure. You need a disaster recovery environment. You need to expand to a new region. You need a staging environment that genuinely mirrors production. If your infrastructure isn't defined in code, each of these requires someone to manually recreate every resource, hoping they remember every configuration detail. The result is environments that look similar but differ in ways that only surface during an incident — the worst possible time to discover a discrepancy.
Drift without detection. Even when infrastructure starts in a known state, manual changes accumulate over time. A hotfix that adjusts a security group. A quick database parameter change to resolve a performance issue. A temporary rule that becomes permanent. Without code as the source of truth, there is no mechanism to detect that the actual state has diverged from the intended state. Drift is silent, cumulative, and eventually catastrophic.
What an Infrastructure as Code Standard Actually Requires
An infrastructure as code standard is the organisational commitment that every cloud resource — without exception — is defined in version-controlled code, provisioned through automated pipelines, and never modified manually in production. It is not a suggestion or a best practice. It is a standard, with the same weight as your coding standards or your security policies.
This means every resource. Not just the compute instances. Not just the "important" ones. Every load balancer, every DNS record, every IAM policy, every database parameter group, every CloudWatch alarm, every S3 bucket policy. The moment you allow exceptions, the exceptions multiply until the standard is meaningless.
Terraform as the Foundation
Terraform has become the dominant tool for infrastructure as code for a reason: it is cloud-agnostic, declarative, and produces a state file that represents the actual provisioned infrastructure. You define what you want. Terraform determines what needs to change to get there. The plan-apply workflow gives you a preview of every change before it executes — the infrastructure equivalent of a code review diff.
But Terraform is a tool, not a standard. The standard is the set of practices around the tool: mandatory code review for all infrastructure changes, automated validation in CI pipelines, state file management with remote backends and locking, module reuse to enforce consistency, and policy-as-code guardrails using tools like Open Policy Agent or Sentinel.
The No-Console Rule
The most important rule in an IaC standard is also the hardest to enforce: nobody modifies production infrastructure through the cloud console. Read access is fine. Debugging and investigation through the console is expected. But changes — any changes — go through code, review, and pipeline.
This is hard because the console is fast. When production is down and the fix is a single security group rule change, the temptation to click rather than commit is enormous. The standard must account for this by making the code-to-deploy path fast enough that it doesn't feel like a bottleneck during incidents. If your Terraform pipeline takes forty-five minutes, people will bypass it. If it takes five minutes, they won't.
What Gets Defined in Code
At minimum, the IaC standard covers: compute resources (instances, containers, serverless functions), networking (VPCs, subnets, route tables, security groups, load balancers), data stores (databases, caches, object storage, message queues), identity and access (IAM roles, policies, service accounts), DNS and certificates, monitoring and alerting configuration, and CI/CD pipeline definitions. If a resource exists in your cloud account and it's not in your Terraform state, it's a compliance violation.
Where It Fails
IaC standards fail when they're introduced without investment in the developer experience around them. If writing Terraform is slow, confusing, or poorly documented for your team, people will work around it. The standard requires shared modules, clear documentation, fast pipelines, and a team that understands both the tools and the reasoning. It also fails when leadership treats it as an engineering preference rather than a business requirement. IaC is not about engineering elegance — it's about operational continuity, auditability, and the ability to hand infrastructure to a different team without a three-month knowledge transfer.
How to Implement an IaC Standard From the Ground Up
Implementing an IaC standard is a sequential process with clear milestones. You don't need to codify everything on day one, but you need a plan that reaches full coverage within a defined timeframe. Here's how this works when we run it through our project delivery framework.
Phase one: inventory and baseline (week one to two). Catalogue every resource in your cloud accounts. Tools like Terraformer or AWS Config can generate an inventory. Compare this against any existing Terraform code. The gap between what exists and what's codified is your scope of work. This is often a sobering exercise — teams routinely discover that less than half their infrastructure is actually defined in code.
Phase two: codify existing infrastructure (week two to six). Import existing resources into Terraform state and write the corresponding configuration. This is painstaking work — each resource needs to be imported, its configuration captured accurately, and the result validated against reality. Prioritise by risk: production databases and networking first, development environment convenience resources last.
Phase three: establish the pipeline (week three to four, parallel with phase two). Build the CI/CD pipeline for infrastructure changes. Every pull request runs terraform plan automatically and posts the output for review. Merges to the main branch trigger terraform apply to staging. Production applies require explicit approval. The pipeline must be fast — under ten minutes for a plan, under fifteen for an apply in most cases.
Phase four: enforce the standard (ongoing). Implement drift detection that runs daily and alerts when actual infrastructure diverges from the Terraform state. Restrict console write access to break-glass scenarios with mandatory post-incident reconciliation. According to the Terraform documentation on state management, consistent state management is the foundation of reliable infrastructure automation. Add policy-as-code checks that block non-compliant resources — no public S3 buckets, no overly permissive security groups, no unencrypted databases.
Phase five: continuous improvement. Refactor into reusable modules. Implement cost estimation in the pipeline using tools like Infracost. Add blast radius analysis so reviewers understand the scope of each change. Build runbooks for common infrastructure operations so the IaC workflow is the fastest path, not a bureaucratic detour.
For teams building new products from concept to launch, the advantage is that you start clean. There's no import phase. You define the standard before the first resource is provisioned and maintain it from day one.
The Three-Month Reverse-Engineering Project That Could Have Been a Repository Clone
The composite scenario from the opening — the engineer who left, the manually configured environment, the three-month reproduction effort — had a specific cost structure worth examining.
The team spent the first two weeks just cataloguing what existed. They mapped seventy-three distinct resources across compute, networking, storage, and security — none documented, none codified. Week three through six was spent writing Terraform configuration for the most critical resources: the production database cluster, the VPC and subnet architecture, the load balancer configuration. They discovered four security group rules that nobody could explain and two IAM roles with permissions far broader than necessary.
Week seven through twelve was consumed by edge cases. A Lambda function triggered by a CloudWatch event that processed billing data. An S3 lifecycle policy that archived logs to Glacier. A Route 53 health check that routed traffic away from an endpoint that no longer existed. Each discovery required investigation to determine whether it was still needed, what depended on it, and how to represent it accurately in code.
The total engineering cost exceeded what it would have cost to implement IaC from day one by a factor of roughly four. And the result was still imperfect — the team acknowledged that their Terraform state probably didn't capture everything.
Contrast this with a handover on a project where IaC was the standard from the start. The incoming team cloned the repository, read the module structure, ran terraform plan to confirm the state matched reality, and were operationally capable within a week. Not because they were better engineers, but because the infrastructure was legible. The code was the documentation.
When to Invest in an IaC Standard — and When to Accept the Debt
Invest now if your infrastructure supports production workloads with real users. The moment your environment serves external traffic, the risk of manual configuration exceeds the cost of codifying it. This is true for mobile app backends, SaaS platforms, e-commerce systems, and any product where downtime has a commercial cost.
Invest now if you have more than one person touching infrastructure. A solo developer who manually configures everything and never plans to leave or scale the team can theoretically get away with it. The moment a second person needs to understand, modify, or reproduce the environment, the lack of code becomes a bottleneck.
Invest now if you face compliance requirements. ISO 27001, SOC 2, HIPAA, and PCI-DSS all require auditable change management processes. Infrastructure defined in code with mandatory review meets these requirements by default. Infrastructure configured through a console requires bolting on compensating controls that are harder to maintain and easier to bypass.
Accept the debt temporarily if you're in a genuine prototype phase — disposable infrastructure for a concept test that will be rebuilt from scratch. But set a clear trigger: the moment the prototype takes real traffic or real data, the IaC standard applies.
Watch the app development trends shaping the industry. As infrastructure grows more complex — multi-cloud, edge compute, serverless alongside containers — the cost of managing it manually increases while the tooling for managing it in code improves. The gap between codified and manual infrastructure only widens.
What to Do Next
Start with a single question: if your production environment were destroyed tonight, could you rebuild it from code by morning? If the answer is no — or even "probably, but we'd need to check a few things manually" — you have infrastructure debt that is accumulating risk.
The path forward is straightforward. Inventory what you have. Codify what exists. Establish the pipeline and the review process. Enforce the standard with drift detection and access controls. This is not a six-month project for most teams. It's a four-to-six-week sprint with compounding returns every week after.
When you're ready to establish an infrastructure as code standard that makes your environment reproducible, auditable, and team-independent, talk to our DevOps team. We'll start with the inventory and have your first Terraform pipeline running before the end of sprint one.
Frequently Asked Questions
What is an infrastructure as code standard?
How long does it take to implement IaC for an existing environment?
What is Terraform and why is it the most common IaC tool?
Can we enforce IaC without blocking emergency fixes?
What happens to infrastructure changes that aren't in code?
They become invisible risk. A manual change to a security group might fix an immediate problem but create a discrepancy between your Terraform state and reality. The next terraform apply could revert the change, causing an outage. Or the change could persist undetected, creating a security posture that doesn't match your documented controls. Drift detection tools compare actual cloud state against your Terraform state and alert on discrepancies. Without this, manual changes accumulate silently until they surface as incidents.
How does IaC relate to compliance frameworks like ISO 27001?
Renji strives for excellence, inspiring teams to grow and improve both professionally and personally, fostering motivation in and outside of work.
Read more Articles by this Author