Sep 14, 2021

Snowflake CI/CD using Jenkins and Schemachange

CI/CD and Management of Data Warehouses can be a serious challenge. In this blog you will learn how to setup CI/CD for Snowflake using Schemachange, Github, and Jenkins. For access to the code check out this repo.

Schemachange is a simple python based tool to manage all of your Snowflake objects. Typically, I appreciate a declarative approach for DevOps, but can be pretty difficult when data migration is involved. Schemachange takes an imperative approach to Database Change Management. The combination of a version control system and CI/CD tool, database changes can be approved and deployed through a pipeline using modern software delivery practices.

Schemachange uses a directory structure like the following:

Step-By-Step Guide

1. Snowflake Account
2. A Snowflake Database
3. A Snowflake User created with appropriate permissions
4. Github Repository
5. Jenkins Server
a. Jenkins Agent with Python 3.7 or greater

Step 1. Create Github Repo
After cloning the repo to your local machine add a folder called migrations

Step 2. Write and Commit init scripts and Jenkinsfile
Insert a file named V0.0.1__init_scripts.sql (make sure to have two under scores after the version)
Add the following to the file

After this create a Jenkinsfile and insert the following code:

#!/usr/bin/env groovy
pipeline {
  agent {label 'your-docker-agent-name-here'}
  environment {
    VIRTUALENV                  = 'snowflake_dev'                           // name of virtual environment
    PYTHON_VERSION              = 'python3.7'                               // can be python3.8, python3.9
    PIP_VERSION                 = 'pip3.7'                                  // can be pip3.8, pip3.9
    PROJECT_FOLDER              = 'migrations'                              // name of project folder where scripts are located
    SF_ACCOUNT                  = 'YOUR SNOWFLAKE ACCOUNT NAME HERE'        // typically everything that comes before
    SF_USER                     = 'YOUR SNOWFLAKE USER ID HERE'
    SF_ROLE                     = 'NAME OF ROLE HERE'                       // Typically requires create, update, delete privileges
    SF_WH                       = 'NAME OF YOUR WAREHOUSE HERE'
    SF_DB                       = 'NAME OF YOUR DATABASE HERE'
    SF_CH                       = 'NAME OF YOUR CHANGE HISTORY TABLE HERE'
    SECRET_LOCATION             = '/home/jenkins/snowflake_pk'              // If you are using a Key Pair Authentication
    JENKINS_CRED_ID_SECRET_FILE = 'YOUR SECRET FILE JENKINS NAME HERE'      // If you are using a Key Pair Authentication
    JENKINS_CRED_ID_SECRET      = 'YOUR SECRET PASSWORD JENKINS NAME HERE'  // If you are using a Key Pair Authentication
    SCHEMACHANGE                = "${WORKSPACE}/${VIRTUALENV}/lib/${PYTHON_VERSION}/site-packages/schemachange/"
  stages {
    stage('Deploying Changes To Sandbox Snowflake') {
      steps {
        withCredentials([file(credentialsId: "${JENKINS_CRED_ID_SECRET_FILE}", variable: 'SNOWFLAKE_PRIVATE_KEY_FILE'),
                         string(credentialsId: "${JENKINS_CRED_ID_SECRET}", variable: 'SNOWFLAKE_PRIVATE_PASSPHRASE')]) {
          writeFile file: "${SECRET_LOCATION}", text: readFile(SNOWFLAKE_PRIVATE_KEY_FILE)
          sh """#!/bin/bash -x
                  echo "PROJECT_FOLDER ${PROJECT_FOLDER}"
                  echo 'Step 1: Installing schemachange'
                  virtualenv ${VIRTUALENV} -p ${PYTHON_VERSION}
                  source ${VIRTUALENV}/bin/activate
                  ${PIP_VERSION} install schemachange --upgrade
                  echo 'Step 2: Running schemachange' 
                  ${PYTHON_VERSION} ${SCHEMACHANGE} -f ${PROJECT_FOLDER} -a ${SF_ACCOUNT} -u ${SF_USER} -r ${SF_ROLE} -w ${SF_WH} -d ${SF_DB} -c ${SF_CH} -v

The command does have an argument that allows you to create a change table. But depending on your enterprise’s Snowflake user security you may not be able to use that argument. So I would recommend just creating the change history table and specify it in the schemachange command.

Step 3. Setup Jenkins Server and Create Jenkins Pipeline
Make sure your jenkins agent has the ability to use python3.7 or greater and allows the use of jenkins-pipelines.
I used a multi-branch pipeline to allow for non-prod and prod deployments.

Step 4. Integrate Github and Jenkins using Webhook
Using the jenkins webhook and adding it to github settings

Step 5. Create private/public key
We are going to create an encrypted private/public key.

openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8

openssl rsa -in rsa_key.p8 -pubout -out

Step 6. Add public key to Snowflake and add Change History Table
In your snowflake account execute the following query but replace the fields with your user and public key.

alter user jsmith set rsa_public_key='MIIBIjANBgkqh...';

After altering your user create your change history table. You don’t have to use SCHEMACHANGE schema. You can use your own. You just have to specify it in your jenkinsfile.


Step 7. Either execute a build or make a change to your repo after jenkins pipeline is setup to kick off a build.

2.9.2 vs 2.9.4
2.9.4 Release allows for the use of unencrypted key pairs, but 2.9.4 was not available at the time of writing this blog.

Thanks for checking it out. If you have any questions feel free to leave a comment or an issue on the github repo!

About the Author

Scott Poulin profile.

Scott Poulin

Sr. Consultant
Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]