Hashicorp’s Vault and Ansible Integrations
Introduction
After all the late nights I’ve spent juggling ssh keys for automation users with Ansible, it became clear that I needed a better way to manage this process. There are plenty of high level services to help with this, but I like to know what’s happening at a lower level and take a minimalist’s approach. A basic integration of Ansible and Hashicorp’s Vault seemed a likely place to start. The entire process remains lightweight and shows itself to have enormous potential.
What you should already know
To get the most out of this article, there are a few things you should know first.
- A basic working knowledge of Ansible, VirtualBox, and Vagrant.
- Understand the Host Only Network.
- Understand SSH.
The Problem
Ansible provides us with a real easy way to manage large numbers of machines which are provisioned with automation users. To illustrate this problem, I assume a single, common automation user (golem). Now that I have servers and golems, I can use Ansible like this:
$ ansible -u golem --private-key /super/protected/golem-key -a 'echo Hello' serverlist
Great, right? Admittedly, its pretty nice to be able to do this, but now I have a /super/protected/golem-key to keep super protected. So maybe I start chmodding directories to limit access, but that’s pretty flimsy security. I could encrypt the key for safe keeping, but now I have encryption keys to keep secret, so I have to look at this process all over again.
My entire cluster is always one secret away from being completely compromised.
Given this failing, I need a better way to manage the keys. Ideally, I wouldn’t need to keep the key myself. In fact, the only things I need to know should be disposable secrets that are replaceable, manageable, and low stress. If access secrets are compromised, I’d like the ability to lock down access to all secrets and prevent damage to my cluster environment.
- Minimal Requirements for Key Management
- Keys are kept in a central repository
- Access to the repository is managed
- Access to secrets (our keys) is auditable
A Solution: Hashicorp’s Vault
Some searching around the web has shown me that Hashicorp’s Vault is suited to my lightweight key management needs. It is distributed as an all-in-one client-server binary file that you just unpack in place. A Vault Server can accommodate multiple Vaults, and each Vault can be managed by one or more people to control access to that Vault’s secrets at a fine-grained level. To fulfill my criteria, Vault is also fully auditable.
Implementing
My plan now is to build a Host-Only Network to house 3 servers: vault, ansible, and webserver. We won’t much be using the webserver VM, but it exists to illustrate the power of a very simple Ansible/Vault integration. If you’re unfamiliar with the construction of a Host-Only Network, you should follow the link to my blog post and give it a reading.
And, indeed, I’ll try to keep this very simple. High level tools and integrations exist to help us on our way once we understand this process at a low level (High level tools such as Consul, for example). My intention is to take the mystery out of an Ansible/Vault integration, not to present a bulletproofed integration.
I built a Vagrantfile to launch and provision my servers, ready to go. Take a moment to familiarize yourself with this file and we’ll go over it momentarily. Note that I’m using the 192.168.33.0/24 network.
I’ve also made available the scripts used during the demo, which are in my github repo here.
Those Bootstrap Keys (stupid and stupid.pub)
Before moving on, I’d like to take a moment to address something you may see in the Vagrantfile. It’s true that I’m keeping ssh keys in my public github repo: stupid and stupid.pub. These are bootstrapping keys that are loaded onto the servers in my cluster before we begin. The point of this exercise is to replace these keys with something less… er… well… stupid.
Over and over, I’ll point at the difference between requirements for the security of the private key (stupid) and the public key (stupid.pub). I’ll start pointing to this now by saying: keep the private key secured. The public key… well you can do anything you want with that. Post it on pastebin, hire a sky-writer, etc. The public key is made to be distributed, but the security is really about that private key.
A Peak At the VM: vault
The vault VM is configured for 2 major elements:
- The Vault Server is configured and running in systemd
- The golem user is set up and ready to be accessed
It all comes from this segment of the Vagrantfile:
yum update -y yum install zip -y yum install python -y yum install git -y yum install libselinux-python -y adduser golem curl -s https://releases.hashicorp.com/vault/0.9.1/vault_0.9.1_linux_amd64.zip?_ga=2.233191262.1530298798.1514997912-403752314.1514997912 https://releases.hashicorp.com/vault/0.9.1/vault_0.9.1_linux_amd64.zip?_ga=2.233191262.1530298798.1514997912-403752314.1514997912 > /vault.zip cd / unzip vault.zip mv vault /bin/vault rm vault.zip echo "golem ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers mkdir /home/golem/.ssh/ chmod 700 /home/golem/.ssh/ chown golem:golem /home/golem/.ssh/ mkdir /src cd /src git clone https://github.com/OPI-doug/public-utils.git cd public-utils cp stupid.pub /home/golem/.ssh/authorized_keys chmod 600 /home/golem/.ssh/authorized_keys chown golem:golem /home/golem/.ssh/authorized_keys cp vault-stuff/vault.service /etc/systemd/system/vault.service mkdir /etc/vault.d/ cp vault-stuff/vconf.conf /etc/vault.d/vconf.conf systemctl daemon-reload systemctl start vault
To get the Vault server up, I download and install Vault (from a binary file), create a systemd service file for it, then start it up. Of special note, if you examine the service file, you’ll find this command being run to start the server:
vault server -config=/etc/vault.d/vhost.conf
This command seems to be conspicuously missing from all the Vault instructions I found. This starts up a Vault server and allows connections to be made to it for ‘vault init’ and everything else that you end up doing. Most of the Vault commands are well documented, and ‘vault server’ may be, too, but I had to spend time figuring it out so I thought I’d share that tidbit with you.
The other highlight of the provisioning process involves setting up the golem user. You’ll see that stupid.key was dropped into the /home/golem/.ssh/authorized_keys file. This is what allows us to use golem’s private-key to log into this server via ssh (and, therefore, Ansible).
Ultimately, our key refresh process requires to overwrite the public key found at /home/golem/.ssh/authorized_keys.
A Peak At the VM: ansible
Most of the demonstration will occur on this VM. Four major events happen to the ansible server for provisioning. After provisioning, you should log into the ansible vm:
doug@OPI demo-dir$ vagrant ssh ansible [vagrant@localhost] ~$
- Vault gets installed and configured for client-side use.
- Ansible gets installed and configured for our cluster.
- The golem user isn’t created here, but the private key is left in our /home/vagrant directory for use with Ansible.
- All necessary scripts and Ansible Playbooks are left in /home/vagrant
yum update -y yum install ansible -y yum install git -y yum install zip -y yum install vim -y curl -s https://releases.hashicorp.com/vault/0.9.1/vault_0.9.1_linux_amd64.zip?_ga=2.233191262.1530298798.1514997912-403752314.1514997912 https://releases.hashicorp.com/vault/0.9.1/vault_0.9.1_linux_amd64.zip?_ga=2.233191262.1530298798.1514997912-403752314.1514997912 > /vault.zip cd / unzip vault.zip mv vault /bin/vault rm vault.zip mkdir /src cd /src git clone https://github.com/OPI-doug/public-utils.git cd public-utils cp stupid /home/vagrant/golem-key chmod 600 /home/vagrant/golem-key chown vagrant:vagrant /home/vagrant/golem-key cp ansible-stuff/hosts /etc/ansible/hosts echo "export VAULT_ADDR=http://192.168.33.12:8200" >> /etc/bashrc mkdir /home/vagrant/files chmod 755 /home/vagrant/files chown vagrant:vagrant /home/vagrant/files cp ansible-stuff/golem-key-refresh.yaml /home/vagrant/golem-key-refresh.yaml chmod 644 /home/vagrant/golem-key-refresh.yaml chown vagrant:vagrant /home/vagrant/golem-key-refresh.yaml cp ansible-stuff/vaultinit.sh /home/vagrant/vaultinit.sh chmod 755 /home/vagrant/vaultinit.sh chown vagrant:vagrant /home/vagrant/vaultinit.sh cp ansible-stuff/ansiblew.sh /home/vagrant/ansiblew.sh chmod 755 /home/vagrant/ansiblew.sh chown vagrant:vagrant /home/vagrant/ansiblew.sh cp ansible-stuff/keyrefreshw.sh /home/vagrant/keyrefreshw.sh chmod 755 /home/vagrant/keyrefreshw.sh chown vagrant:vagrant /home/vagrant/keyrefreshw.sh cp ansible-stuff/sshconfig /home/vagrant/.ssh/config chmod 600 /home/vagrant/.ssh/config chown vagrant:vagrant /home/vagrant/.ssh/config
Aside from the simple pleasure of provisioning with ansible, not a whole lot of magic is presented here. Our scripts are in place for the demo and ansible is fully configured and ready to go with a hosts file in /etc/ansible/.
Important to this demo, /home/vagrant/golem-key exists. This is a copy of ‘stupid’, the golem private key, and will be overwritten shortly (with the demo).
A Peak At the VM: webserver
There are only two things happening on the webserver.
- Install httpd and run it.
- Install the golem automation user and provide the public key.
This server doesn’t even need Vault installed. It gets a pretty minimal installation. There is also a forwarded port from the host server so we can see our Fedora standard index.html page by going to ‘localhost:8080’
Process Walkthrough
Now that the VMs have been provisioned simply by typing “vagrant up”, we can move on to the demonstration. This process has a number of steps to successfully illustrate the Vagrant/Ansible integration and key-refresh process.
- Initialize the Vault.
- Unseal the Vault.
- Auth the Vault.
- Enable Auditing.
- Refresh the keys with an Ansible Wrapper.
- Use the new keys with another Ansible Wrapper.
- Look at the audit logs.
Initialize the Vault and Unseal It
As every Ansible Server comes with no initialized vaults, it’s now time to set one up. Note that each vault can have a number of owners, with the owners being those people who have the keys. There are concepts of ‘key shares’ and ‘key thresholds’ to understand: if ‘key shares’ == 3 and ‘key threshold’ == 2, then 3 key shares are created and 2 of them are required to unseal the vault. First, we use the vaultinit.sh to initialize the vault and capture any necessary tokens. We set the shares and the threshold to 1 just to take a minimalist approach for this walkthrough.
#! /bin/bash vault init -key-shares=1 -key-threshold=1 > init.log cat init.log | grep Unseal | awk '{print $4}' > unseal.token cat init.log | grep Root | awk '{print $4}' > root.token
After this step is complete, we should be able to unseal the vault.
$ vault unseal `cat unseal.token`
Auth
Now the Vault is unsealed and we can use our Auth Token to grant us access.
$ vault auth `cat root.token`
Worst Practice: Using the Root Auth Token
The Root token we just authed with has power over the entire vault we just initialized. By all rights, we should be using it to create less privileged tokens and using those tokens in this process. For now, though, its enough to remember that we have a much finer-grained token control available to us than we’re using. In the interest of simplicity, I’m going to just stick with using the Root token, which should never be done on a production system.
Enable Auditing
After authorization is granted, we are able to tell our Vault to go ahead and begin keeping audit logs. A number of audit back-ends are available, but I’m going to use syslog.
$ vault audit-enable syslog
Now our vault is adding request/response information to its logs for auditing. We’ll take a look at these soon.
Refresh the Keys
It’s time to examine the keyrefreshw.sh script. This script accomplishes a few things at the command line level. It creates a new set of ssh keys, uses ansible to distribute the public keys, then cleans itself up. You’ll note that we no longer have a ‘golem-key’ file in /home/vagrant.
[vagrant@localhost ~]$ ./keyrefreshw.sh Success! Data written to: secret/newgolemkey Success! Data written to: secret/newgolemkeypub PLAY [golem-key-refresh] ******************************************************* TASK [Gathering Facts] ********************************************************* ok: [192.168.33.14] ok: [192.168.33.12] TASK [copy in the golem public key] ******************************************** changed: [192.168.33.12] changed: [192.168.33.14] PLAY RECAP ********************************************************************* 192.168.33.12 : ok=2 changed=1 unreachable=0 failed=0 192.168.33.14 : ok=2 changed=1 unreachable=0 failed=0 Success! Data written to: secret/golemkey Success! Data written to: secret/golemkeypub
The process is a success, so now it’s impossible to access a golem account without first calling the private key out of vault.
Use the New Keys
Since we have no private-key anymore, the only way to access golem is to pull a key from vault first. I use a simple bash script to do this (ansiblew.sh). It creates the key, runs the ansible command, and then destroys the key. Let’s try it:
[vagrant@localhost ~]$ ./ansiblew.sh 192.168.33.12 | SUCCESS | rc=0 >> I worked! 192.168.33.14 | SUCCESS | rc=0 >> I worked! [vagrant@localhost ~]$
We can run this multiple times, demonstrating how to fetch a private key from the Vault before running the ansible command.
Let’s see the Logs
At this point, the only major item left is to examine the audit logs which were enabled earlier. These logs are stored on the vault VM by systemd. Since we turned on vault with systemd, we can find the logs managed by journald. We can log out of ansible, into vault, and access the logs with the following short run of commands:
[vagrant@localhost] ~$ exit doug@OPI demo-dir$ vagrant ssh vault [vagrant@localhost] ~$ sudo journalctl -f -u vault
Once auditing is enabled, vault will begin logging all its access requests and responses in json format to the system logs. Because we picked syslog, we are free to use all the powers of rsyslogd to process and monitor these access logs. An example entry is below.
Jan 15 20:27:41 localhost.localdomain vault[20681]: {"time":"2018-01-15T20:27:41.486073456Z","type":"request","auth":{"client_token":"hmac-sha256:3a6171150204efa4003a7e34f73efb47e75f7536e78aa185120cee243952cce6","accessor":"hmac-sha256:6d09a91c9eee58388e6828f4c1f092b43e0d4d345033f42ee1a7722cc41ae4f0","display_name":"root","policies":["root"],"metadata":null,"entity_id":""},"request":{"id":"4e4fe614-e095-8452-6ceb-02999d0cf01b","operation":"read","client_token":"hmac-sha256:3a6171150204efa4003a7e34f73efb47e75f7536e78aa185120cee243952cce6","client_token_accessor":"hmac-sha256:6d09a91c9eee58388e6828f4c1f092b43e0d4d345033f42ee1a7722cc41ae4f0","path":"secret/newgolemkey","data":null,"policy_override":false,"remote_address":"192.168.33.13","wrap_ttl":0,"headers":{}},"error":""} Jan 15 20:27:41 localhost.localdomain vault[20681]: {"time":"2018-01-15T20:27:41.48698996Z","type":"response","auth":{"client_token":"hmac-sha256:3a6171150204efa4003a7e34f73efb47e75f7536e78aa185120cee243952cce6","accessor":"hmac-sha256:6d09a91c9eee58388e6828f4c1f092b43e0d4d345033f42ee1a7722cc41ae4f0","display_name":"root","policies":["root"],"metadata":null,"entity_id":""},"request":{"id":"4e4fe614-e095-8452-6ceb-02999d0cf01b","operation":"read","client_token":"hmac-sha256:3a6171150204efa4003a7e34f73efb47e75f7536e78aa185120cee243952cce6","client_token_accessor":"hmac-sha256:6d09a91c9eee58388e6828f4c1f092b43e0d4d345033f42ee1a7722cc41ae4f0","path":"secret/newgolemkey","data":null,"policy_override":false,"remote_address":"192.168.33.13","wrap_ttl":0,"headers":{}},"response":{"secret":{"lease_id":""},"data":{"value":"hmac-sha256:3dceea3651130ae851db1411376f4b1cd3a0b22c47fdb9aedffb44a19c3d6a3f"}},"error":""}
A few data items are of immediate note. Check through the following list and find the items in the JSON above:
- type “request” or “response”
- remote_address the IP of the originating request
- path above, it’s “secret/newgolemkey”. This is the secret requested and returned.
- data A hash with a “value” key. This value key contains encrypted data.
All of the data in the JSONs should be understood in a production environment, but the four above are important to show the logged request/responses.
Further Study
This is a brief look into the power and usability of Vault as a secret-store for use in an Ansible controlled cluster. There are endless possibilities from here, but a short list springs to mind immediately.
- Use Consul.
- Examine Ansible’s built in vault, which seems like a less powerful way to store secrets but should be understood in better detail.
- Vault comes with a built-in ssh that should be explored.
- Use of the Vault API could provide a robust suite of CLI tools.
Conclusion and Thanks
Thank you for reading this far. I hope you’ve enjoyed this little demonstration of an extremely basic Ansible/Vault integration. You should now understand this integration at a low-level, allowing you to make decisions about high-level integrations with greater confidence.
I’d like to give special thanks to Mr. Teigen Leonard, without whose support and inquisitive mind I would never have tackled this topic.