Designing Kubernetes Controllers
There has been some excellent online discussion lately around Kubernetes controllers, highlighted by an excellent Speakerdeck presentation assembled by Tim Hockin. What I’d like to do in this post is explore some of the implications of the model that the Kubernetes project has provided us for managing the state of distributed system.
Let’s nail down some of the basics. Kubernetes heart, the API server, maintains the state of the cluster. That state is stored in the ETCD database, but no other clients access that database directly. The API server is then responsible for informing clients of the current state of the system. Clients register themselves with the API server as interested in updates on a certain set of resources. This is known in Kubernetes terminology as a watch.
Within this model, we have to remember some of the challenges of designing a strong distributed system. The network cannot be assumed to be reliable, have zero latency, or have infinite bandwidth. With these challenges in mind, you will never see Kubernetes providing a change set to its controllers as it has to assume that these change set requests may fail. Knowing about the state of the downstream controllers and which change sets they have acted appropriately on is a basically impossible scaling problem as you would continue to need to communicate bidirectionally between the API server and the controllers within the cluster.
Instead, Kubernetes approach is that these watch events will be provided only once to consumers. The data provided in these watch events consists of the entire current state of the object in question. Controllers are then responsible for using their understanding of the system to take this object and reconcile the current state with the desired state. The Kubernetes API server doesn’t want to have to know about the reconciliation state or reconciliation logic, but rather delegates to controllers to provide implementation for that logic.
Now that the API server has delegated the responsibility to the controller, the controller itself needs to take all this information and make the necessary updates to the running system. How can controllers make this calculation? One thing is absolutely certain: controllers themselves shouldn’t implement their own state management system to handle this reconciliation, as the world class distributed system state management system is right there in the API server! This is where the design of the status field comes into play. Kubernetes controllers should use status fields to store, update, and maintain any data they need related to each individual object in the API server.
Now, we can get into the real meat of utilizing this system for actual work. Let’s lay out some key considerations:
- The relationship between the resource definition and controller is critical for effective utilization of Kubernetes in this model.
- Strong boundaries between related resource types can enable cleaner logic for managing state of a distributed system.
- Retrieving the entire current state (polling the world) will be necessary to avoid missing resource modifications.
The rest of this post assumes that you intend to encode custom logic into a Kubernetes API server via a Custom Resource Definition and manage it with a controller. This post will dive a little deeper into some of the concerns you should address before proceeding to implementation.
Resource Definition/Controller Relationship
Kubernetes custom resources, generally, are made up of four types of data:
- TypeMeta – metadata about the resource type
- ObjectMeta – metadata about the specific object instance
- Spec – the exposed capabilities/data of the resource
- Status – the current state of the object in the cluster
The TypeMeta and ObjectMeta are always handled by the Kubernetes API server. Generally, behavior is not intended to be modified by the object metadata, but Kubernetes resources like Ingress have had to take that approach as the spec fields for Ingress did not provide a flexible enough interface to define the possible Ingress behaviors. Ingress is a great example to consider as you define Kubernetes resource types. This resource sat in a beta state for many years as the project maintainers and community looked for a way to unify the specification and the disparate use cases of bringing http(s) traffic into a Kubernetes cluster.
Let’s look at this from the perspective of designing a new resource and try to avoid this kind of challenge. Kubernetes will provide the controller with these four types of data. This is where the status field can be the fulcrum for us, and why designing the controller in conjunction with the object API is critical. We can use the status field to figure out current processing state(s) and utilize that information as the state of the controller itself.
Let’s get connected to a real world example and outline some of the ways that we could design a custom resource. As in my previous post, I love to use home maintenance as a great example that can connect us more physically to the concepts defined here. Here’s a real simple example. Let’s say you are defining how to manage all the ingredients for dinner this evening. You can start with the resource type, let’s call it Ingredient. Now, our spec field will consist of things we need to know about the ingredients, such as amount, measurement metric (oz, g, mL), etc. Our metadata field can contain information that is not critical to the recipe itself, such as brand name and cost. What will the status field contain?
This is where we need to understand what the controller is responsible for. If our ingredient maintenance system system is responsible for locating and providing what we need at appropriate times, it would probably be helpful to understand where the ingredient is at this time. The ingredient spec tells us exactly what we need (the desired state) “I need 1 tbsp of salt” for the recipe, but it would be far easier to accomplish this if at the same time I asked for the salt I told you that I already measured out the salt and have the tablespoon laid out on the counter. This is where the status field comes in to simplify the role of the controller.
This leads us nicely into the next section, so I’m going to pause the example for a moment to summarize. Design the status field in such a way that the controller provides data to itself in order to simplify its own processing.
Strong resource boundaries
As I was thinking about defining resource controllers in my home, I thought about the division of responsibilities between chefs in a kitchen. Think about the challenge faced putting on a large dinner for friends, looking forward to those days again sometime, and the lines of communication that are required between those different individuals. Maintaining strong boundaries between Kubernetes resource definitions can keep our theoretical kitchen functioning well, even under stressful situations.
What do I mean by strong boundaries in this context? The first implication is not to create one controller to rule them all. This would be analogous to firing all of the supporting staff in our hypothetical kitchen and forcing our master chef to fill all the roles. Imagine how well that will go when things start going wrong. This can be a tempting pattern given the single place to look for failure, but Kubernetes model is designed to allow you to easily maintain many independently operating controllers.
So now, we’ve separated our state definition into a number of different resource types. How do we design these distinct types to use Kubernetes reconciliation process effectively? The goal here is that related objects can be independently maintained and controlled without requiring bidirectional communication. The state and status of one object should only depend on the operation of itself and immediately downstream resources. Kubernetes deployments are a great example of this. If you look at the status field of a Kubernetes deployment, you will see that it informs you about pod count, but all it is doing is getting data from its “owned” ReplicaSets. As the owner of the downstream ReplicaSet object, the Deployment controller subscribes to changes to the ReplicaSet status. It doesn’t report any information about the downstream pods or any object outside of its directly owned objects. Knative services (the ksvc) is another great specification to study in this vein as it controls multiple downstream resources.
Poll the World
Finally, we must accept that we will miss some events. Again, the network is not reliable, we should start to think of it more like a busy kitchen! Every once in awhile we need to check in with the current state source of truth to ensure that our controller cache is operating smoothly. In the kitchen analogy once again, this is like doing inventory, checking the entire order over as we are preparing the food, and reviewing recipes as we are working through the evening rush.
Now, the controller is going to have a cache of current resources that it is maintaining and it can re-poll just those resources, but we have to understand that a creation event can be missed. This is where a re-poll comes in. We can compare the resources that the Kubernetes API server expects we are operating on to the list of resources in the current controller in-memory cache to ensure we are lining up.
This is where one flexibility of the Kubernetes API server comes in, the finalizer. We can avoid the concern around missing a delete event when we use finalizers. This is because finalizers must report success back to the Kubernetes API server before the object is deleted. The finalizer tells the API server it is done by removing itself from the list of finalizers on the object. Again, with an inconsistent network, it is critical to think about the consequences of getting two finalizer requests. Fortunately, with this concept we never have to be concerned about not completing a finalizer action.
This blog post was intended to inform some of the key consideration around designing Kubernetes custom resources and implementing controllers for managing those resources. As a final set of recommendations, consider the following items:
- Kubernetes will provide us with a state instead of a change set, so we need to use status effectively to create an efficient operator
- Keep the responsibility for your controllers small and segregate the control loops
- Use finalizers to ensure pre-delete actions are completed
- Avoid using object metadata, such as labels and annotations, to change resource behavior
Feel free to leave a comment and let me know how you go through this process defining your own Kubernetes custom resources!