Dec 9, 2021

Airflow + Helm: Simple Airflow Deployment

This is part two of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published.

Previously, we formulated a plan to provision Airflow in a Kubernetes cluster using Helm and then build up the supporting services and various configurations that we will need to ensure our cluster is production ready. This post will focus on getting the Helm chart deployed to our Kubernetes service. This most basic of configurations requires a database and we have chosen to use PostgreSQL in this case.

Code samples can be found here.

Preparing the Database

Here we will assume that you have:

  • PostgreSQL Database Server
  • Credentials with administrative access
  • psql CLI accessible

I will be using the Azure PostgreSQL Service but any compatible version will do. First Log into the database server using the psql command:

psql "host=************.postgres.database.azure.com port=5432 dbname=postgres user=**************** password=********* sslmode=require"

Next, referring to the Airflow documentation, we can execute the following commands:

CREATE DATABASE airflow_db;
CREATE USER airflow WITH PASSWORD 'your-password';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow;

Pulling the Chart and Value File

After the database is set up, we can move on to preparing the chart and our values file. Using Helm, add the airflow chart repository:

helm repo add apache-airflow https://airflow.apache.org

For the values file, retrieve the default values from the chart.

curl https://raw.githubusercontent.com/apache/airflow/main/chart/values.yaml > values.yaml

Set Airflow to use the KubernetesExecutor:

executor: "KubernetesExecutor"

Make sure we have some example DAGs to play with:

env:
 - name: AIRFLOW__CORE__LOAD_EXAMPLES
   value: "True"

Turn off the charts provided PostgreSQL resources:

postgresql:
 enabled: false

Input credentials and database information:

data:
 metadataConnection:
   user: airflow@some-host
   pass: your-password
   protocol: postgresql
   host: some-host.postgres.database.azure.com
   port: 5432
   db: airflow_db
   sslmode: require

Deploying the Chart

Now that we have our values file setup for our database, we can deploy the chart. Authenticate with the cluster:

az aks get-credentials --name airflow-demo --resource-group airflow-demo

Add a namespace:

kubectl create ns airflow

The Airflow chart has a tendency towards long run times so, increase the timeout as you install the chart:

helm upgrade \
       --install \
       -f values.yaml \
       --namespace airflow \
       --timeout 30m0s \
       --wait=false \
       airflow \
       apache-airflow/airflow

After Helm exits, we can navigate to our Kubernetes Dashboard and see the replica sets, pods, etc., that have been provisioned.

Image of Kubernetes Management console showing airflow pods.
Airflow pods running in Azure Kubernetes Service.

Now we should login into the cluster using the credentials provided in the Helm output. As we didn’t enable the ingress feature of the chart, access to the Airflow cluster requires port forwarding:

kubectl port-forward svc/airflow-webserver 8080:8080 --namespace airflow

Navigating to http://localhost:8080 will bring up the login in screen. After using the credentials in the Helm output, you’ll see a table of DAGs.

Airflow Home Page with table of DAGs.
Airflow Home Page with table of DAGs.

To test our installation, unpause a DAG using the toggle on the left side of the screen and execute it. We expect a number of pods to be created as the tasks execute.

Demonstrating the connection between DAG Runs, Tasks and Pods.
Demonstrating the connection between DAG Runs, Tasks and Pods.

And that’s it, we have an Airflow cluster up and running. Now we can work on tuning the cluster to better fit our needs. The next installment in this 5-part series will handle logging in Apache Airflow!

About the Author

Jacob Nosal profile.

Jacob Nosal

Sr Consultant
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Retrofit2: Get the body from an error response
Retrofit2 is a nice library for making HTTP rest requests. It includes a static utility (CallUtils) for getting the result from your request, but if the api you’re calling doesn’t return a 2xx request it […]
Airflow Logging: Task logs to Elasticsearch
This is part three of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published. Airflow: Planning a Deployment Airflow + Helm: Simple Airflow Deployment More […]
Using Nix as a Professional
How to use Nix as a tool to optimize developer time with real-life examples.
Enterprise Auth for Airflow: Azure AD
This is part three of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published. Airflow: Planning a Deployment Airflow + Helm: Deploying the Chart Without […]