Dec 9, 2021

Airflow + Helm: Simple Airflow Deployment

This is part two of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published.

Previously, we formulated a plan to provision Airflow in a Kubernetes cluster using Helm and then build up the supporting services and various configurations that we will need to ensure our cluster is production ready. This post will focus on getting the Helm chart deployed to our Kubernetes service. This most basic of configurations requires a database and we have chosen to use PostgreSQL in this case.

Code samples can be found here.

Preparing the Database

Here we will assume that you have:

  • PostgreSQL Database Server
  • Credentials with administrative access
  • psql CLI accessible

I will be using the Azure PostgreSQL Service but any compatible version will do. First Log into the database server using the psql command:

psql "host=************ port=5432 dbname=postgres user=**************** password=********* sslmode=require"

Next, referring to the Airflow documentation, we can execute the following commands:

CREATE USER airflow WITH PASSWORD 'your-password';

Pulling the Chart and Value File

After the database is set up, we can move on to preparing the chart and our values file. Using Helm, add the airflow chart repository:

helm repo add apache-airflow

For the values file, retrieve the default values from the chart.

curl > values.yaml

Set Airflow to use the KubernetesExecutor:

executor: "KubernetesExecutor"

Make sure we have some example DAGs to play with:

   value: "True"

Turn off the charts provided PostgreSQL resources:

 enabled: false

Input credentials and database information:

   user: airflow@some-host
   pass: your-password
   protocol: postgresql
   port: 5432
   db: airflow_db
   sslmode: require

Deploying the Chart

Now that we have our values file setup for our database, we can deploy the chart. Authenticate with the cluster:

az aks get-credentials --name airflow-demo --resource-group airflow-demo

Add a namespace:

kubectl create ns airflow

The Airflow chart has a tendency towards long run times so, increase the timeout as you install the chart:

helm upgrade \
       --install \
       -f values.yaml \
       --namespace airflow \
       --timeout 30m0s \
       --wait=false \
       airflow \

After Helm exits, we can navigate to our Kubernetes Dashboard and see the replica sets, pods, etc., that have been provisioned.

Image of Kubernetes Management console showing airflow pods.
Airflow pods running in Azure Kubernetes Service.

Now we should login into the cluster using the credentials provided in the Helm output. As we didn’t enable the ingress feature of the chart, access to the Airflow cluster requires port forwarding:

kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow

Navigating to http://localhost:8080 will bring up the login in screen. After using the credentials in the Helm output, you’ll see a table of DAGs.

Airflow Home Page with table of DAGs.
Airflow Home Page with table of DAGs.

To test our installation, unpause a DAG using the toggle on the left side of the screen and execute it. We expect a number of pods to be created as the tasks execute.

Demonstrating the connection between DAG Runs, Tasks and Pods.
Demonstrating the connection between DAG Runs, Tasks and Pods.

And that’s it, we have an Airflow cluster up and running. Now we can work on tuning the cluster to better fit our needs. The next installment in this 5-part series will handle logging in Apache Airflow!

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]