Jan 28, 2021

Exploring the CloudEvents Landscape

The cloudevents specification provides a standardized way of describing events that is meant to be consistent, accessible, and portable. The idea of the specification is simple; lots of cloud systems and tools produce and consume events, let’s make those events useful! The specification itself is owned by the CNCF, the project went 1.0 in November 2019 and is an Incubating level CNCF project as of today, January 2021.

In theory, this specification provides users of many systems an ability to create and process events with a standard structure and common metadata that can be used across languages, clouds and protocols. In an interview with InfoQ, Clemens Vasters, a principal architect of eventing and messaging, said

The goal was to provide an industry definition and open framework for what an “event” is, what its minimal semantic elements are, and how events are encoded for transfer and how they are transferred and do so using the major encodings and application protocols in use today rather than inventing new ones.

There’s also some significant confusion about the intent behind this specification. Is it for machine only consumption? Should human readable events be emitted in this format for consumption elsewhere? The CloudEvents project specification would say that any “event” type can be encoded in an appropriate way for human and machine consumptions. Some major open source projects that have eventing systems have not yet integrated with the specification. Kubernetes, one of the flagships of the CNCF, has discussed applying this specification without any movement towards implementation. The KEDA project, built in partnership between Microsoft and RedHat, is staying with Kubernetes’ strategy for the time. Strong integration between KEDA and CloudEvents feels natural given Microsoft’s role in the specification, but no movement has been made on that front at this time.

On the other hand, many of the emerging sub-projects within the Kubernetes landscape have adopted CloudEvents as a main format for distributing cloud-native, interoperable events. Tekton, Knative, and the Serverless.com Event Gateway all support this format, even though the three open source projects maintain ownership by separate foundations.

What’s in the Spec?

So, what’s in this “common format” that can help us understand the data and metadata of a portable event? You can read it for yourself, but I’ll break down the basics here.

A cloud event is made up of both required and optional attributes and a mapping between these attributes and a protocol or encoding. There are two types of encodings for the message: binary and structured. If the protocol supports separation of message metadata from its data, then the message is binary. If the protocol encodes the data and metadata together into a structured object, then the message is structured.

A great example of the difference between these message formats is an HTTP request vs a JSON document. An HTTP request comes with a set of meta-attributes, the HTTP Headers, and so the natural protocol binding for cloud events to HTTP is a binary structure. On the other hand, each protocol binding must support both message formats, so you can send an HTTP request with content-type equal to the structured message format the data is encoded into. To understand this further, let’s think about JSON.

JSON by itself does not segregate information between metadata and data. There is a single document holding all the information without any necessary distinction between different fields. Therefore, HTTP can be used as a structured message by encoding the full cloud event into a JSON document. JSON itself can be converted from structured-type to binary type by using the “data_base64” and “data_contenttype” attributes as a manner for storing binary data within JSON attributes. It’s turtles all the way down!

The required attributes of a cloud event are fairly simple. They are:

  • id – a string that is unique when combined with the source field
  • source – a URI that identifies the context in which the event occured
  • specversion – currently, just the string “1.0” to denote compliance with the CloudEvents 1.0 specification
  • type – the contextual information about the type of data encoded in the Event. This should be a reverse DNS address, such as com.mycompany.thing and can be versioned

Event data itself is optional! You can provide information that an event occurred without providing any additional information.

Standard optional attributes of a cloud event are:

  • datacontenttype – describes the format and encoding of the event data
  • dataschema – a URI that identifies the schema and versioning information for the event data
  • subject – a string that describes the subject of the event from the perspective of the event producer. See more info here.
  • time – an RFC 3339 timestamp

I would like to work through a simple example of this using something completely contrived. Let’s say I’ve got a company that wants to help you manage your home appliances by providing you great information about how they are operating. I turn on the dishwasher and out pops an event:

id: 00000000-0001-0002-0003-867530986753
source: nifty-home-automation.gov/appliance-watcher
specversion: 1.0
type: gov.nifty-home-automation.applicance_startup.dishwasher.v1
datacontenttype: JSON
subject: DishWasher3000
time: 2021-01-21T22:00:00-05:00
  "started_by": "HAL",
  "number_of_clean_dishes": 0,
  "number_of_dirty_dishes": 99

We’ve got a great little event up here. We see our company and the type of data it is producing. There is a UUID identifying this event. Notice that since the format data is specific to the type of appliance, the type of the event specifically calls out that it is a dishwasher event.

We’re going to come back later and look at how our nifty little company can process the events that we have created.

CloudEvent Supporting Tools

Of course, in any specification, it is critical that a landscape of tools emerges that can support the necessary actions defined in the specification. There are different categories of tools that we can evaluate in this space. An event itself has a few phases and we can think about the tools in relation to the event phase.

An event goes through three major phases in its lifecycle: producing, brokering, and consuming. CloudEvents support sequencing modifications to events, which allows for looping through these phases multiple times for a single event. The tools outlined below play a role in one or more of those event phases.

First, for writing custom producers and consumers, SDKs are written in many languages that support the variety of defined protocols and encodings. The CloudEvent SDK has its own set of technical requirements, and must support the structured and binary encodings discussed above.

Next, for brokering events, there are optional specifications defined for many common open source data brokers. Many standard protocols and encodings for the transfer of information, such as HTTP, Kafka, and AVRO can be found in these additional specifications.

Adding on to this capability, adapters exist that operate on the non-conformant events that are published by third party tools. These, together with the proprietary specifications, extend the CloudEvents reach beyond open source tools that have grown up natively using this format.

Another powerful tool in this landscape is the capability to filter and operate on events. One tool often used for CloudEvents is the CEL Spec and its implementations. Cel defines an expression language for matching cloud events and operating on them. Tekton and Knative both use CEL extensively for their event handling systems.

How would we use cel to operate on our nifty appliance watcher company? Let’s say we want to find out who is starting these appliances because our water bill is driving through the roof. We write an expression:

ce.type.matches("gov.nifty-home-automation.applicance_startup.*") && 
ce.source == "nifty-home-automation.gov/appliance-watcher"

Now, we can parse the data field for the “started_by” attribute and see who keeps turning on the diswasher!

One final tool to note is the fact that the meta-attribute support provides a great integration for distributed tracing systems. Jaeger and Zipkin both support passing the trace headers along via cloud events so that you can visualize the interaction between source systems that produce events and a set of consuming systems that act on that event. Once that data is all flowing into the same visualization system, rich tracing information can allow you to easily highlight failed dependencies and dig through a complex web of interactions that may happen from a single source.

Wrapping Up

So, if you have a system that needs to produce and consume data in a variety of formats and broker that data across multiple protocols and consumer types, cloud events may be exactly the specification you are looking for. There are lots of integrations that exist in the cloud landscape that can be found at the CloudEvents site. I think more tools are going to begin supporting this specification, especially vendors, as the mutli-cloud world becomes a reality for these providers. Many cloud vendors already top the list of the integrations published on that site.

Keep an eye on this evolving space as the CNCF continues to support the growth of this specification across protocols and tools.

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]