Jan 21, 2021

Why we started using JSON Schema in our internal Helm charts

Helm 3 supports validating the provided chart values against a JSON Schema. While it may be quicker to get started in your chart development without a schema, we found it valuable for a number of reasons.

I won’t get into the nitty gritty of JSON Schema, or even exactly how to use it with Helm — the documentation does a good job of that. Here, I want to tell you why we used it, and I hope to convince you that you should too.

Client background

My client’s team had developed a standard application deployment Helm chart that encapsulated standards around deployment on our Kubernetes platform. While not necessary for usage on the platform, this Helm chart provided a convenient abstraction to the container platform and provided sane defaults. The values that teams provided to the Helm chart configured their application, integration tests, and related infrastructure.

The values for the Helm chart, then, essentially became the de facto API for deploying to the Kubernetes cluster. This pattern in itself can be debated (perhaps CRDs and an associated controller could work better? That’s something we’re looking at), but it’s the situation we found ourselves in with dozens of applications deployed to the platform.

So when Helm 3 introduced JSON Schema support, we jumped on it. When Helm encounters a chart, a values.schema.json file, it validates the provided values against this schema – whether in install, upgrade, lint, or template operations. This was immediately helpful in validating values as developers transitioned to using a new version of the chart.

As a very simple example, here is the output of helm lint failing when it is provided a values file that does not fit the required schema:

$ helm lint --values tests/no_name.yaml
==> Linting .
[ERROR] values.yaml: - app.0: name is required

[ERROR] templates/: values don't meet the specifications of the schema(s) in the following chart(s):
my-app-chart:
- app.0: name is required

Error: 1 chart(s) linted, 1 chart(s) failed

Benefit #1: Validating Chart upgrades

As with any API, we made assumptions and learned a lot as we iterated. As with any API, breaking changes eventually had to be made. Helm has a versioning system, so we were able to control this in a reasonable way, but it was still a hassle for teams to upgrade to a new version of the chart. By adding a JSON schema, development teams were able to easily validate that their values still matches the required schema.

Benefit #2: Cleaner required properties, enumerations

Before JSON Schema, values requirements required some fairly verbose Helm templating logic:

# deployment.yaml
{{- if (not hasKey $a "visibility")}}
        {{- fail "'visibility' is a required value!"}}
{{- end}}
{{- if not (eq "public" .Values.visibility)}}
    {{- if not (eq "internal" .Values.visibility)}}
        {{- fail (printf "Visibility type '%s' not recognized.  Must be public or internal" .Values.visibility)}}
    {{- end}}
{{- end}}

Rather than specifying required properties using complicated Helm templating logic, you can specify required properties declaratively using the schema. In my experience, this is a much cleaner way to institute these requirements:

{
  "type": "object",
  "required": ["visibility"],
  "properties": {
    "visibility": {
      "type": "string",
      "enum": ["public", "internal"]
    }
   }
}

Benefit #3: Protecting against YAML silliness

Like most users of Helm, the values are provided as YAML. While YAML is relatively easy to read, it has some common problems that plague it’s users, such as:

  • Indentation
    • When developers are trying to update a YAML file, they may inadvertently insert a key at the wrong level, especially if there are multiple nested levels. Without a schema, that key is generally ignored by Helm. By enforcing a schema, you can catch this up front and avoid frustrating troubleshooting at deployment time.
  • Types
    • YAML will parse Tru as a String, but Yes or True or False or Off as a boolean. It will also parse 01234214124 (e.g., an AWS account ID) as a number without the leading zero: 1234214124. It will parse 1.5.0 as a string, but 1.5 as a number.
    • By enforcing a schema, you can catch these issues up front.

This is by no means an exhaustive list — the internet is replete with examples of problems with YAML. But for most of them,

Benefit #4: Lintable/Testable documentation

We wrote some build-time tools to automate testing the code blocks in our Markdown documentation. While this is a blog topic for another day, by adding the schema to our chart, we were able to more quickly catch issues in our documentation when they didn’t match up with the implementation.

Adding JSON Schema after the fact

Since we were adding the schema after we had many developers in our organization using the tool, we wanted to prevent this from being an issue to our existing users. Thankfully, Helm stores the values used as part of the release state in the cluster. (In Helm 3, this is a Secret resource, by default).

Utilizing commands such as helm ls -o json -A and helm status -n my-namespace my-release -o json | jq .config > my-namespace-my-release-values.jsoh (this assumes you have jq installed), we were able to write some scripts to aid me in development:

  1. By getting the actual values in use in production and lower environments, it helped us to ensure we didn’t miss any attributes as we wrote the JSON Schema.
  2. It also helped us to identify places where users had inadvertently added attributes that were never supported, kept attributes that were no longer supported, or used types that could be interpreted incorrectly by the YAML processor (see “YAML silliness”, above). By identifying these issues, I was able to submit PRs and/or document potential issues in the upgrade notes.

Finally, we considered the introduction of a schema to be a breaking change, and as we follow semantic versioning on our Helm chart, we released it with a major version bump, which indicates to our users that they should look at the chart README and determine whether they make any changes.

Summary

Using JSON Schema made working with YAML more reliable, similar to how using strong types with Groovy or other optionally-typed languages can help catch issues earlier in the development process.

While great care should be taken in defining your chart values and thoughtfully making breaking changes, a JSON schema can help both the chart’s developers, and its users, in numerous ways.

About the Author

Object Partners profile.

One thought on “Why we started using JSON Schema in our internal Helm charts

Leave a Reply

Your email address will not be published.

Related Blog Posts
Building Better Data Visualization Experiences: Part 1 of 2
Through direct experience with data scientists, business analysts, lab technicians, as well as other UX professionals, I have found that we need a better understanding of the people who will be using our data visualization products in order to build them. Creating a product utilizing data with the goal of providing insight is fundamentally different from a typical user-centric web experience, although traditional UX process methods can help.
Kafka Schema Evolution With Java Spring Boot and Protobuf
In this blog I will be demonstrating Kafka schema evolution with Java, Spring Boot and Protobuf.  This app is for tutorial purposes, so there will be instances where a refactor could happen. I tried to […]
Redis Bitmaps: Storing state in small places
Redis is a popular open source in-memory data store that supports all kinds of abstract data structures. In this post and in an accompanying example Java project, I am going to explore two great use […]
Let’s build a WordPress & Kernel updated AMI with Packer
First, let’s start with What is an AMI? An Amazon Machine Image (AMI) is a master image for the creation of virtual servers in an AWS environment. The machine images are like templates that are configured with […]