Sep 14, 2021

Snowflake CI/CD using Jenkins and Schemachange

CI/CD and Management of Data Warehouses can be a serious challenge. In this blog you will learn how to setup CI/CD for Snowflake using Schemachange, Github, and Jenkins. For access to the code check out this repo.

Schemachange is a simple python based tool to manage all of your Snowflake objects. Typically, I appreciate a declarative approach for DevOps, but can be pretty difficult when data migration is involved. Schemachange takes an imperative approach to Database Change Management. The combination of a version control system and CI/CD tool, database changes can be approved and deployed through a pipeline using modern software delivery practices.

Schemachange uses a directory structure like the following:

Step-By-Step Guide

Prerequisites
1. Snowflake Account
2. A Snowflake Database
3. A Snowflake User created with appropriate permissions
4. Github Repository
5. Jenkins Server
a. Jenkins Agent with Python 3.7 or greater

Step 1. Create Github Repo
After cloning the repo to your local machine add a folder called migrations

Step 2. Write and Commit init scripts and Jenkinsfile
Insert a file named V0.0.1__init_scripts.sql (make sure to have two under scores after the version)
Add the following to the file

After this create a Jenkinsfile and insert the following code:

#!/usr/bin/env groovy
pipeline {
  agent {label 'your-docker-agent-name-here'}
  environment {
    VIRTUALENV                  = 'snowflake_dev'                           // name of virtual environment
    PYTHON_VERSION              = 'python3.7'                               // can be python3.8, python3.9
    PIP_VERSION                 = 'pip3.7'                                  // can be pip3.8, pip3.9
    PROJECT_FOLDER              = 'migrations'                              // name of project folder where scripts are located
    SF_ACCOUNT                  = 'YOUR SNOWFLAKE ACCOUNT NAME HERE'        // typically everything that comes before snowflakecomputing.com
    SF_USER                     = 'YOUR SNOWFLAKE USER ID HERE'
    SF_ROLE                     = 'NAME OF ROLE HERE'                       // Typically requires create, update, delete privileges
    SF_WH                       = 'NAME OF YOUR WAREHOUSE HERE'
    SF_DB                       = 'NAME OF YOUR DATABASE HERE'
    SF_CH                       = 'NAME OF YOUR CHANGE HISTORY TABLE HERE'
    SECRET_LOCATION             = '/home/jenkins/snowflake_pk'              // If you are using a Key Pair Authentication
    JENKINS_CRED_ID_SECRET_FILE = 'YOUR SECRET FILE JENKINS NAME HERE'      // If you are using a Key Pair Authentication
    JENKINS_CRED_ID_SECRET      = 'YOUR SECRET PASSWORD JENKINS NAME HERE'  // If you are using a Key Pair Authentication
    SCHEMACHANGE                = "${WORKSPACE}/${VIRTUALENV}/lib/${PYTHON_VERSION}/site-packages/schemachange/cli.py"
  }
  stages {
    stage('Deploying Changes To Sandbox Snowflake') {
      steps {
        withCredentials([file(credentialsId: "${JENKINS_CRED_ID_SECRET_FILE}", variable: 'SNOWFLAKE_PRIVATE_KEY_FILE'),
                         string(credentialsId: "${JENKINS_CRED_ID_SECRET}", variable: 'SNOWFLAKE_PRIVATE_PASSPHRASE')]) {
          writeFile file: "${SECRET_LOCATION}", text: readFile(SNOWFLAKE_PRIVATE_KEY_FILE)
          sh """#!/bin/bash -x
                  echo "PROJECT_FOLDER ${PROJECT_FOLDER}"
                  echo 'Step 1: Installing schemachange'
                  virtualenv ${VIRTUALENV} -p ${PYTHON_VERSION}
                  source ${VIRTUALENV}/bin/activate
                  ${PIP_VERSION} install schemachange --upgrade
                  export SNOWFLAKE_PRIVATE_KEY_PATH=${SECRET_LOCATION}
                  export SNOWFLAKE_PRIVATE_KEY_PASSPHRASE=${SNOWFLAKE_PRIVATE_PASSPHRASE}
                  echo 'Step 2: Running schemachange' 
                  ${PYTHON_VERSION} ${SCHEMACHANGE} -f ${PROJECT_FOLDER} -a ${SF_ACCOUNT} -u ${SF_USER} -r ${SF_ROLE} -w ${SF_WH} -d ${SF_DB} -c ${SF_CH} -v
           """
        }
      }
    }
  }
}

The command does have an argument that allows you to create a change table. But depending on your enterprise’s Snowflake user security you may not be able to use that argument. So I would recommend just creating the change history table and specify it in the schemachange command.

Step 3. Setup Jenkins Server and Create Jenkins Pipeline
Make sure your jenkins agent has the ability to use python3.7 or greater and allows the use of jenkins-pipelines.
I used a multi-branch pipeline to allow for non-prod and prod deployments.

Step 4. Integrate Github and Jenkins using Webhook
Using the jenkins webhook and adding it to github settings

Step 5. Create private/public key
We are going to create an encrypted private/public key.

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8

openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub

Step 6. Add public key to Snowflake and add Change History Table
In your snowflake account execute the following query but replace the fields with your user and public key.

alter user jsmith set rsa_public_key='MIIBIjANBgkqh...';

After altering your user create your change history table. You don’t have to use SCHEMACHANGE schema. You can use your own. You just have to specify it in your jenkinsfile.

CREATE TABLE IF NOT EXISTS SCHEMACHANGE.CHANGE_HISTORY
(
VERSION VARCHAR
,DESCRIPTION VARCHAR
,SCRIPT VARCHAR
,SCRIPT_TYPE VARCHAR
,CHECKSUM VARCHAR
,EXECUTION_TIME NUMBER
,STATUS VARCHAR
,INSTALLED_BY VARCHAR
,INSTALLED_ON TIMESTAMP_LTZ
);

Step 7. Either execute a build or make a change to your repo after jenkins pipeline is setup to kick off a build.

Gotchas
2.9.2 vs 2.9.4
2.9.4 Release allows for the use of unencrypted key pairs, but 2.9.4 was not available at the time of writing this blog.

Thanks for checking it out. If you have any questions feel free to leave a comment or an issue on the github repo!

About the Author

Scott Poulin profile.

Scott Poulin

Sr. Consultant
Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Infrastructure as Code – The Wrong Way
You are probably familiar with the term “infrastructure as code”. It’s a great concept, and it’s gaining steam in the industry. Unfortunately, just as we had a lot to learn about how to write clean […]
How to get your pull requests approved more quickly
TL;DR The fewer reviews necessary, the quicker your PR gets approved. Code reviews serve an essential function on any software codebase. Done right, they help ensure correctness, reliability, and maintainability of code. On many teams, […]
Kafka & Kubernetes: Scaling Consumers
Kafka and Kubernetes (K8s) are a great match. Kafka has knobs to optimize throughput and Kubernetes scales to multiply that throughput. On the consumer side, there are a few ways to improve scalability. Resource & […]
AWS RDS MYSQL Playground
Do you need a reliable database platform to hammer out some new application ideas? Or, maybe you’d like to learn MYSQL in a disposable environment? This Hashicorp Terraform MYSQL RDS Module will build all the […]