Sep 22, 2021

Infrastructure as Code – The Wrong Way

You are probably familiar with the term “infrastructure as code”. It’s a great concept, and it’s gaining steam in the industry. Unfortunately, just as we had a lot to learn about how to write clean application code, you can easily fail to write clean infrastructure code. Here are a few common ways that we’ve seen infrastructure code done the wrong way, and some ways you can do better.

1. Use latest dependencies

The wrong way: “Don’t peg your dependencies to specific versions, that’s too much trouble! You always want the latest and greatest! If you just set your dependencies to latest, then you’ll get the best version every time you build. This works with Node packages, Docker base images, Python packages, Java libraries. Even if you pin a version, don’t worry about hashsums. They won’t replace the tagged image unless it’s really necessary.”

A better way:

  • Peg your dependencies to a particular version. This will help ensure you get the same result whether you build your project today, tomorrow, or six months from now.
  • Use a lockfile if your system supports it—this will ensure that the package or image will match the desired hashsum.
  • Install all of your dependencies on your image at build time—avoid running a package install when a server instance comes online. This will help prevent issues occurring at runtime, if a new instance comes online.
  • From a security perspective, it can be valuable to use tooling that will identify insecure versions of libraries at both build time and runtime. This will help you catch potential vulnerabilities quickly, but fix them with a discrete action in source control.

2. Write big scripts

The wrong way: “Write a quick 2-line Bash script that will launch your infrastructure. Infrastructure As Script! Oh, you need to tear down your infrastructure? We can write another script that will delete it. Need to update the infrastructure, well, let’s write an if-block…”

Your code will quickly start to look something like this (pseudo-code):

asg = get_auto_scaling_group(...)
if !asg.exists() {
   asg = new AutoScalingGroup(min_size=2, max_size=5)
   create_auto_scaling_group(asg)
} else if asg.min_size != 2 {
   asg.min_size = 2
   update_auto_scaling_group(asg)
}
....

This is code that is not very testable, readable, or maintainable. It is because you are managing state using imperative code.

A better way:

I prefer the term “declarative infrastructure”. This is something like Terraform, Kubernetes resources, or CloudFormation, that you define the desired state, and the tool compares it to current state, and then performs the necessary actions. That ends up looking like

resource "aws_autoscaling_group" "bar" {
  name                      = "foobar3-terraform-test"
  min_size                  = 2
  max_size                  = 5
  ...
}
  • Rather than a script or application with imperative commands, build declarative configuration and use a tool that will figure out what it needs to do to get there.
  • This avoids complex logic, which means it helps you avoid bugs
  • Kubernetes codifies this model with an API built around declarative resources (which is why those imperative CLI commands such as kubectl expose or kubectl run are counterproductive, and shouldn’t exist)

3. Deploy your infrastructure from your workstation

The wrong way: “Why wait for Terraform to run on a CI server? You can run it on your laptop. Don’t worry about the Terraform state – it can be locked. You’ll always remember to commit your local source code! Somebody will eventually review the PR and merge it. And everybody else will have the same tools, access, and local environment that you have.”

In reality, you know that folks make mistakes. They forget to commit a change they run. They accidentally use the wrong version of a tool such as Terraform. They have different access than others on the team.

A better way:

Run your infrastructure through a CI tool such as GitHub Actions, Azure DevOps Pipelines, GitLab, or any similar process.

  • On PR builds, run a lint, dry-run, terraform plan – anything you can to help validate that it will work.
  • Deploy your infrastructure for CI to ensure the same versions of tools are used, the same credentials, and that there is an auditable change history.

4. Run everything locally

The wrong way: “The only way to be sure that everything will work is to run everything locally! So let’s run Kubernetes, an API gateway, Kafka, databases, and every microservice in the company in a local virtual machine!”

If you choose to unravel this ball of string eventually you’ll be running the whole internet on your laptop. Obviously, that’s not feasible. Testing this way is the sort of “end-to-end” test that is at the top of the test pyramid — slower, more expensive, and more difficult to troubleshoot when something goes wrong.

A better way:

I prefer to let engineers run software the way they prefer to run it – often, this is to use IDE tools. Requiring local application to run in a local Kubernetes cluster or even Docker Compose stack can make it difficult to use the native debuggers or test frameworks.

And if you are attempting to “run the world” locally, consider what problem you are actually trying to solve. Is it that you want to get the Kubernetes config correct? Perhaps consider running it through a JSON schema validation. Are you needing to interact with an external service? Consider stubbing or mocking out that service.

Whole books could be written on this, but my rule of thumb is to try and reduce engineer cycle time as much as possible. If they are writing application code, they should be able to iterate quickly, locally, with unit tests or local “poking around” builds. If they are interacting with another API, they should be able to test against some published contract or specification.

5. Use YAML

The wrong way: “All configurations should be in YAML. Even better—put YAML inside of your YAML. Best—Template your YAML inside your YAML”

Gross! Listen, I use YAML as much as the next person, but we have to recognize that we have a problem. YAML is easier to read (and template) than JSON. But it has a lot of shortcomings. Here are some fun examples I run into, regularly:

  • boolean or numeric values used where a string is expected—such an environment variable value. These will cause errors when deserializing the YAML unless they are quoted.
  • 0233849835: AWS account IDs with a leading zero—these can be interpreted as a number, which will remove the leading zero: 233849835
  • 1.2.0 is interpreted as a string, "1.2.0", but 1.2 is interpreted a numeric value 1.2.
  • 042 is interpreted by some parsers as an octal number, and translated into the decimal 34.

There are just a lot of ways that YAML can go wrong.

A better way:

As mentioned above—schemas are available for Kubernetes APIs and Helm chart values, and you should use them. Insert validation of your YAML files in the “validate” phase of the pipeline so that you can catch issues before merging to the main branch.

And maybe consider using JSON or another data format!

Conclusion

Hope you enjoyed this post—I had fun writing it and delivering it as a presentation! We run into all of these sins from time to time, and I have perhaps committed most of them myself. But the important thing is that we learn from the mistake, and do better next time!

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Building Better Data Visualization Experiences: Part 1 of 2
Through direct experience with data scientists, business analysts, lab technicians, as well as other UX professionals, I have found that we need a better understanding of the people who will be using our data visualization products in order to build them. Creating a product utilizing data with the goal of providing insight is fundamentally different from a typical user-centric web experience, although traditional UX process methods can help.
Kafka Schema Evolution With Java Spring Boot and Protobuf
In this blog I will be demonstrating Kafka schema evolution with Java, Spring Boot and Protobuf.  This app is for tutorial purposes, so there will be instances where a refactor could happen. I tried to […]
Redis Bitmaps: Storing state in small places
Redis is a popular open source in-memory data store that supports all kinds of abstract data structures. In this post and in an accompanying example Java project, I am going to explore two great use […]
Let’s build a WordPress & Kernel updated AMI with Packer
First, let’s start with What is an AMI? An Amazon Machine Image (AMI) is a master image for the creation of virtual servers in an AWS environment. The machine images are like templates that are configured with […]