Consul on Docker EE
“Consul is a service networking solution to connect and secure services across any runtime platform and public or private cloud” – https://consul.io
This post details a way to run a Consul cluster on Docker EE in Swarm mode. I found it difficult to configure and want to share my solution to help others. The discussion will use Consul 1.3.1 and is not intended to be a complete guide. We’ll also consider how to handle multiple clusters for different environments.
Networking
The networking of Consul is the most complicated part. Consul uses several ports, TCP and UDP, some ports can use a load balancer and others can not.
Docker networking requires us to declare the ports we use and how to expose them. This necessitates understanding ingress networking and host mode networking.
Ingress networking exposes a port across all workers in the swarm and forwards connections to one of the containers. For example, if there are 20 workers in the swarm and 2 containers, all 20 workers will listen on the ingress port and will forward to a container, usually chosen by round-robin. A consequence of this is that only one service can use a given ingress port. If you want multiple Consul clusters in a swarm, each cluster will need to specify a unique set of ports. Typically there is a DNS entry or load balancer that points to all worker nodes that can be used to reach the service.
Host mode networking exposes a port only on the worker nodes running the container(s) for the service. Unless ingress networking, the host name or IP address of the worker node must be targeted to reach the service. This may seem undesirable, but some Consul ports do not work with load balancing, such as the serf ports.
Ports, from https://www.consul.io/docs/install/ports.html
Name | Purpose | Port | TCP/UDP | Mode |
---|---|---|---|---|
server | Server TCP | 8300 | TCP,UDP | host |
LAN | Serf LAN | 8301 | TCP,UDP | host |
WAN | Serf WAN | 8302 | TCP,UDP | host |
HTTP | HTTP API | 8500 | TCP | ingress |
DNS | DNS Resolution | 8600 | TCP,UDP | ingress |
Multiple Environments
Supporting multiple clusters, for example to segment between dev, test and production environments, requires configuring unique sets of ports. I also recommend different encryption keys for the gossip protocol. In case of pointing to the wrong ports, the encryption key will prevent agents from connecting.
Snapshots
Snapshots of Consul data is necessary as a backup strategy. It is especially important when running in the configuration shown here in which each agent’s data is in the container filesystem, which means it is ephemeral. In case too many containers are replaced to retain quorum, the snapshot can be used to get the cluster running.
The stack in this post takes snapshots at 5 minute intervals and keeps them for 10 days. To keep things simple, the container command for consul-dev-snapshot is a sh script. A separate container image could be made using cron.
The restore procedure requires exec’ing into the consul-dev-snapshot
container and then issuing the consul snapshot restore
command such as the following:
$ consul snapshot restore -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.123456789.dat
Ensure your persistent volume solution has the resiliency you need.
Stack
Here’s a stack to start from. The consul*dev
network will need to be created before hand using swarm scope. The com.docker
labels configure networking and the load balancer for accessing the Consul UI and API endpoint. You will need to update the CONSUL_LOCAL*CONFIG
for the tokens. Another solution is to create a custom image based on consul:1.3.1
and create the Consul configuration using a script. This post is including the config inline for simplicity, configuring the secrets isn’t the difficult part.
The volume
configuration will depend on how your Docker EE installation integrates with persistent storage.
Consul Stack
version: '3.7' networks: consul_dev: external: true services: consul-dev: image: "consul:1.3.1" networks: consul_dev: {} deploy: replicas: 3 labels: - com.docker.ucp.access.label=/dev - com.docker.lb.hosts=consul.company.com - com.docker.lb.port=${PORT_PREFIX:-800}2 - com.docker.lb.network=consul_dev - com.docker.lb.service_cluster=dev - com.docker.lb.ssl_cert=consul.crt - com.docker.lb.ssl_key=consul.key update_config: parallelism: 1 failure_action: rollback delay: 120s monitor: 60s order: start-first rollback_config: parallelism: 1 failure_action: pause delay: 120s monitor: 60s order: start-first placement: preferences: - spread: node.hostname resources: limits: cpus: '0.5' memory: 512M reservations: cpus: '0.2' memory: 128M restart_policy: condition: any delay: 5s window: 120s environment: CONSUL_LOCAL_CONFIG: '{"disable_update_check": true, "disable_host_node_id": true, "acl_datacenter": "ho", "acl_default_policy": "deny", "acl_token": "anonymous", "acl_master_token": "6fbc4fca-c947-4c23-b42b-0a4616fd9964", "acl_agent_token": "3a1d488b-51ed-4ad2-8796-d6e5bdd43e14"}' volumes: - /etc/hostname:/tmp/hostname - /etc/hosts:/tmp/hosts command: > /bin/sh -c "consul agent -server -datacenter=dc1 -retry-join=tasks.consul-dev:${PORT_PREFIX:-800}1 -bootstrap-expect=3 -advertise=$(cat /tmp/hosts | grep -v ^127[.] | cut -d ' ' -f 1 | head -n 1) -node=master-$(cat /tmp/hostname) -client=0.0.0.0 -data-dir=/consul/data -config-dir=/consul/config -ui -server-port=${PORT_PREFIX:-800}0 -serf-lan-port=${PORT_PREFIX:-800}1 -http-port=${PORT_PREFIX:-800}2 -dns-port=${PORT_PREFIX:-800}3 -serf-wan-port=${PORT_PREFIX:-800}4 -log-level=info -encrypt=$(cat /run/secrets/consul_encrypt_dev) " ports: - target: ${PORT_PREFIX:-800}0 published: ${PORT_PREFIX:-800}0 protocol: tcp mode: host - target: ${PORT_PREFIX:-800}1 published: ${PORT_PREFIX:-800}1 protocol: tcp mode: host - target: ${PORT_PREFIX:-800}1 published: ${PORT_PREFIX:-800}1 protocol: udp mode: host - target: ${PORT_PREFIX:-800}2 published: ${PORT_PREFIX:-800}2 protocol: tcp mode: ingress - target: ${PORT_PREFIX:-800}3 published: ${PORT_PREFIX:-800}3 protocol: tcp mode: ingress - target: ${PORT_PREFIX:-800}3 published: ${PORT_PREFIX:-800}3 protocol: udp mode: ingress - target: ${PORT_PREFIX:-800}4 published: ${PORT_PREFIX:-800}4 protocol: tcp mode: host - target: ${PORT_PREFIX:-800}4 published: ${PORT_PREFIX:-800}4 protocol: udp mode: host healthcheck: test: ["CMD", "/usr/bin/curl", "-f", "http://127.0.0.1:${PORT_PREFIX:-800}2/v1/status/peers"] interval: 1m30s timeout: 10s retries: 3 start_period: 40s secrets: - consul_encrypt_dev - consul_master_token_dev - consul_agent_token_dev consul-dev-snapshot: image: "consul:1.3.1" networks: consul_dev: {} deploy: replicas: 1 labels: [com.docker.ucp.access.label=/dev] resources: limits: cpus: '0.5' memory: 128M reservations: cpus: '0.2' memory: 32M restart_policy: condition: any command: > /bin/sh -c "while true; do echo Taking Consul snapshot; consul snapshot save -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.$(date -Iminutes).dat; echo Pruning old snapshots; find /snapshots -mtime 10 -delete -print; sleep 300s; done " volumes: - type: volume source: snapshotsdev target: /snapshots secrets: - consul_master_token_dev volumes: snapshotsdev: driver: "local" labels: - com.docker.ucp.access.label=/dev secrets: consul_encrypt_dev: external: true consul_master_token_dev: external: true consul_agent_token_dev: external: true
It’s important (and unfortunate) to note that Docker EE in swarm mode and Docker CE in swarm mode do not operate the same. For example, the tasks.consul-dev
name used for service discovery does not work in Docker CE, at the time of this writing.
Enjoy your fault tolerant Consul cluster!