Enterprise Auth for Airflow: Azure AD

This is part three of a five-part series addressing Airflow at an enterprise scale. I will update these with links as they are published.

  • Airflow: Planning a Deployment
  • Airflow + Helm: Deploying the Chart Without any Nautical Puns
  • More Charts: Adding TLS to Airflow
  • Enterprise Authentication for Airflow: Azure AD

Previously, we added TLS to our Airflow cluster with Cert-Manager and LetsEncrypt. This post will focus on configuring enterprise auth Airflow using Azure Active Directory App registrations. As well, Microsoft Graph will be used to manage the claims on ID and access tokens.

Azure Active Directory App Registration

Azure Active Directory allows applications to register with an Azure AD Tenant, which creates a client configuration with your application’s settings, such as:

  • Single or Multi-tenant access
  • Redirect URL
  • Logout URL

Azure AD allows for each application to be assigned roles that are scoped to an application instance. We want to create an application configuration that meets the following criteria:

  • OAuth code grant flow for Airflow
  • Translate roles from Azure AD application configuration to FAB roles

A Note Regarding Redirect URLs

This was hard to find in any documentation but I was able to deduce that the name used in the OAuth configuration determines the redirect URL – /oauth-authorized/provider_name. Use this to configure your redirect URLs. Redirect URLs must be HTTPS unless you’re on localhost. 

Create the Application Registration

From the Azure portal, Azure Active Directory > App Registrations > New Registration and enter an application name and redirect URL.

Enterprise Auth for Airflow: Azure AD

Create the application, and add a localhost redirect and logout URL. Now we need to modify the claims on the ID and Access token.

Navigate to the Token Configuration pane of the Azure AD application. For ID and Access tokens, add an optional claim on the

  • Email
  • Preferred_username
  • Given_name
  • Family_name
  • UPN

Edit the Groups Claim to include the (future) application roles in the token.

Navigate to the App Roles pane of the Azure AD application and some relevant groups:

webserver_config.py

To modify the authentication of Airflow, we should replace the webserver.webserverConfig parameter of the airflow-values.yaml. We should add the OAuth provider configuration, modify the role mapping, and implement the get_oauth_user_info method to build custom user info mapping from claims.

Navigate to your app in Azure AD and then Certificates and Secrets > Client Secrets. Add a secret and record it in a safe place for later use.

Full code examples are here.

import os
from airflow.configuration import conf
from airflow.utils.log.logging_mixin import LoggingMixin
from flask_appbuilder.security.manager import AUTH_OAUTH
from airflow.www.security import AirflowSecurityManager

SQLALCHEMY_DATABASE_URI = conf.get("core", "SQL_ALCHEMY_CONN")
basedir = os.path.abspath(os.path.dirname(__file__))
CSRF_ENABLED = True
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
        { 
            'name':'azure', 'token_key':'access_token', 'icon':'fa-windows',
            'remote_app': {
                "api_base_url": "https://login.microsoftonline.com/{tenant_id}",
                "request_token_url": None,
                'request_token_params': {
                    'scope': 'openid email profile'
                },
                "access_token_url": "https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token",
                "access_token_params": {
                    'scope': 'openid email profile'
                },
                "authorize_url": "https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/authorize",
                "authorize_params": {
                    'scope': 'openid email profile'
                },
                'client_id':’{your-client-id}’,
                'client_secret':'{your-client-secret}'
            }
        }
    ]

Add the Role Mapping

Airflow provides Public, Viewer, User, Op, and Admin roles. You can map them however is useful for your organization, but I am mapping the custom roles we created in our app to the provided Airflow roles and leaving the Public role as the base role. If a user is able to login but has no app roles assigned, sign-in will succeed but won’t have access to any resources.

AUTH_USER_REGISTRATION_ROLE = "Public"
AUTH_USER_REGISTRATION = True
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_ROLES_MAPPING = {
    "airflow_nonprod_admin": ["Admin"],
    "airflow_nonprod_dev": ["Op"],
    "airflow_nonprod_viewer": ["Viewer"]
}

I have added the AUTH_ROLES_SYNC_AT_LOGIN so that the role mapping is computed at each login and changes in roles reflect the current state of Azure AD.

Implementing get_oauth_user_info

We will stub out the implementation of <a href="https://flask-appbuilder.readthedocs.io/en/latest/api.html#flask_appbuilder.security.manager.BaseSecurityManager.get_oauth_user_info" target="_blank" rel="noreferrer noopener" title="https://flask-appbuilder.readthedocs.io/en/latest/api.html#flask_appbuilder.security.manager.BaseSecurityManager.get_oauth_user_info">get_oauth_user_info</a> so we can get a token and analyze it using jwt.io. This will help us build a dictionary representing our user info.

class AzureCustomSecurity(AirflowSecurityManager, LoggingMixin):
    def get_oauth_user_info(self, provider, response=None):
        if provider == "azure":
            id_token = response["id_token"]
            self.log.debug(str(id_token))
            me = self._azure_jwt_token_parse(id_token)
            return me
        else:
            return {}

SECURITY_MANAGER_CLASS = AzureCustomSecurity

Now, go to your stdout and get the id_token. Navigating to jwt.io, input your token and parse it.

ID Token Payload

Now we can easily map our parsed ID token to the user info dictionary.

parsed_token = {
    "name": me["name"],
    "email": me["email"],
    "first_name": me["given_name"],
    "last_name": me["family_name"],
    "id": me["oid"],
    "username": me["preferred_username"],
    "role_keys": me["roles"],       
}
return parsed_token

And we are done. Deploy the code and log into your airflow cluster. Go checkout the profile page to see your calculated roles in Airflow.

Our role mappings worked! And now our cluster is secured.

About the Author

Object Partners profile.
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]