Jan 21, 2021

Why we started using JSON Schema in our internal Helm charts

Helm 3 supports validating the provided chart values against a JSON Schema. While it may be quicker to get started in your chart development without a schema, we found it valuable for a number of reasons.

I won’t get into the nitty gritty of JSON Schema, or even exactly how to use it with Helm — the documentation does a good job of that. Here, I want to tell you why we used it, and I hope to convince you that you should too.

Client background

My client’s team had developed a standard application deployment Helm chart that encapsulated standards around deployment on our Kubernetes platform. While not necessary for usage on the platform, this Helm chart provided a convenient abstraction to the container platform and provided sane defaults. The values that teams provided to the Helm chart configured their application, integration tests, and related infrastructure.

The values for the Helm chart, then, essentially became the de facto API for deploying to the Kubernetes cluster. This pattern in itself can be debated (perhaps CRDs and an associated controller could work better? That’s something we’re looking at), but it’s the situation we found ourselves in with dozens of applications deployed to the platform.

So when Helm 3 introduced JSON Schema support, we jumped on it. When Helm encounters a chart, a values.schema.json file, it validates the provided values against this schema – whether in install, upgrade, lint, or template operations. This was immediately helpful in validating values as developers transitioned to using a new version of the chart.

As a very simple example, here is the output of helm lint failing when it is provided a values file that does not fit the required schema:

$ helm lint --values tests/no_name.yaml
==> Linting .
[ERROR] values.yaml: - app.0: name is required

[ERROR] templates/: values don't meet the specifications of the schema(s) in the following chart(s):
my-app-chart:
- app.0: name is required

Error: 1 chart(s) linted, 1 chart(s) failed

Benefit #1: Validating Chart upgrades

As with any API, we made assumptions and learned a lot as we iterated. As with any API, breaking changes eventually had to be made. Helm has a versioning system, so we were able to control this in a reasonable way, but it was still a hassle for teams to upgrade to a new version of the chart. By adding a JSON schema, development teams were able to easily validate that their values still matches the required schema.

Benefit #2: Cleaner required properties, enumerations

Before JSON Schema, values requirements required some fairly verbose Helm templating logic:

# deployment.yaml
{{- if (not hasKey $a "visibility")}}
        {{- fail "'visibility' is a required value!"}}
{{- end}}
{{- if not (eq "public" .Values.visibility)}}
    {{- if not (eq "internal" .Values.visibility)}}
        {{- fail (printf "Visibility type '%s' not recognized.  Must be public or internal" .Values.visibility)}}
    {{- end}}
{{- end}}

Rather than specifying required properties using complicated Helm templating logic, you can specify required properties declaratively using the schema. In my experience, this is a much cleaner way to institute these requirements:

{
  "type": "object",
  "required": ["visibility"],
  "properties": {
    "visibility": {
      "type": "string",
      "enum": ["public", "internal"]
    }
   }
}

Benefit #3: Protecting against YAML silliness

Like most users of Helm, the values are provided as YAML. While YAML is relatively easy to read, it has some common problems that plague it’s users, such as:

  • Indentation
    • When developers are trying to update a YAML file, they may inadvertently insert a key at the wrong level, especially if there are multiple nested levels. Without a schema, that key is generally ignored by Helm. By enforcing a schema, you can catch this up front and avoid frustrating troubleshooting at deployment time.
  • Types
    • YAML will parse Tru as a String, but Yes or True or False or Off as a boolean. It will also parse 01234214124 (e.g., an AWS account ID) as a number without the leading zero: 1234214124. It will parse 1.5.0 as a string, but 1.5 as a number.
    • By enforcing a schema, you can catch these issues up front.

This is by no means an exhaustive list — the internet is replete with examples of problems with YAML. But for most of them,

Benefit #4: Lintable/Testable documentation

We wrote some build-time tools to automate testing the code blocks in our Markdown documentation. While this is a blog topic for another day, by adding the schema to our chart, we were able to more quickly catch issues in our documentation when they didn’t match up with the implementation.

Adding JSON Schema after the fact

Since we were adding the schema after we had many developers in our organization using the tool, we wanted to prevent this from being an issue to our existing users. Thankfully, Helm stores the values used as part of the release state in the cluster. (In Helm 3, this is a Secret resource, by default).

Utilizing commands such as helm ls -o json -A and helm status -n my-namespace my-release -o json | jq .config > my-namespace-my-release-values.jsoh (this assumes you have jq installed), we were able to write some scripts to aid me in development:

  1. By getting the actual values in use in production and lower environments, it helped us to ensure we didn’t miss any attributes as we wrote the JSON Schema.
  2. It also helped us to identify places where users had inadvertently added attributes that were never supported, kept attributes that were no longer supported, or used types that could be interpreted incorrectly by the YAML processor (see “YAML silliness”, above). By identifying these issues, I was able to submit PRs and/or document potential issues in the upgrade notes.

Finally, we considered the introduction of a schema to be a breaking change, and as we follow semantic versioning on our Helm chart, we released it with a major version bump, which indicates to our users that they should look at the chart README and determine whether they make any changes.

Summary

Using JSON Schema made working with YAML more reliable, similar to how using strong types with Groovy or other optionally-typed languages can help catch issues earlier in the development process.

While great care should be taken in defining your chart values and thoughtfully making breaking changes, a JSON schema can help both the chart’s developers, and its users, in numerous ways.

About the Author

David Norton profile.

David Norton

Director, Platform Engineering

Passionate about continuous delivery, cloud-native architecture, DevOps, and test-driven development.

  • Experienced in cloud infrastructure technologies such as Terraform, Kubernetes, Docker, AWS, and GCP.
  • Background heavy in enterprise JVM technologies such as Groovy, Spring, Spock, Gradle, JPA, Jenkins.
  • Focus on platform transformation, continuous delivery, building agile teams and high-scale applications.
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Simple improvements to making decisions in teams
Software development teams need to make a lot of decisions. Functional requirements, non-functional requirements, user experience, API contracts, tech stack, architecture, database schemas, cloud providers, deployment strategy, test strategy, security, and the list goes on. […]
JavaScript Bundle Optimization – Polyfills
If you are lucky enough to only support a small subset of browsers (for example, you are targeting a controlled set of users), feel free to move along. However, if your website is open to […]
Creating Mocks For Unit Testing in Go
Unit testing is an important part of any project, and Go built its framework with a testing package; making unit testing part of the language. This testing framework is good for most scenarios, but you […]
Resetting Database Between Spring Integration Tests
When tasked with having to write an integration test or a Spring Webflux test that uses a database, it can be cumbersome to have to reset the database between each test by using @DirtiesContext. Using […]