Consul on Docker EE

“Consul is a service networking solution to connect and secure services across any runtime platform and public or private cloud” – https://consul.io

This post details a way to run a Consul cluster on Docker EE in Swarm mode. I found it difficult to configure and want to share my solution to help others. The discussion will use Consul 1.3.1 and is not intended to be a complete guide. We’ll also consider how to handle multiple clusters for different environments.

Networking

The networking of Consul is the most complicated part. Consul uses several ports, TCP and UDP, some ports can use a load balancer and others can not.

Docker networking requires us to declare the ports we use and how to expose them. This necessitates understanding ingress networking and host mode networking.

Ingress networking exposes a port across all workers in the swarm and forwards connections to one of the containers. For example, if there are 20 workers in the swarm and 2 containers, all 20 workers will listen on the ingress port and will forward to a container, usually chosen by round-robin. A consequence of this is that only one service can use a given ingress port. If you want multiple Consul clusters in a swarm, each cluster will need to specify a unique set of ports. Typically there is a DNS entry or load balancer that points to all worker nodes that can be used to reach the service.

Host mode networking exposes a port only on the worker nodes running the container(s) for the service. Unless ingress networking, the host name or IP address of the worker node must be targeted to reach the service. This may seem undesirable, but some Consul ports do not work with load balancing, such as the serf ports.

Ports, from https://www.consul.io/docs/install/ports.html

Name Purpose Port TCP/UDP Mode
server Server TCP 8300 TCP,UDP host
LAN Serf LAN 8301 TCP,UDP host
WAN Serf WAN 8302 TCP,UDP host
HTTP HTTP API 8500 TCP ingress
DNS DNS Resolution 8600 TCP,UDP ingress

Multiple Environments

Supporting multiple clusters, for example to segment between dev, test and production environments, requires configuring unique sets of ports. I also recommend different encryption keys for the gossip protocol. In case of pointing to the wrong ports, the encryption key will prevent agents from connecting.

Snapshots

Snapshots of Consul data is necessary as a backup strategy. It is especially important when running in the configuration shown here in which each agent’s data is in the container filesystem, which means it is ephemeral. In case too many containers are replaced to retain quorum, the snapshot can be used to get the cluster running.

The stack in this post takes snapshots at 5 minute intervals and keeps them for 10 days. To keep things simple, the container command for consul-dev-snapshot is a sh script. A separate container image could be made using cron.

The restore procedure requires exec’ing into the consul-dev-snapshot container and then issuing the consul snapshot restore command such as the following:

$ consul snapshot restore -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.123456789.dat

Ensure your persistent volume solution has the resiliency you need.

Stack

Here’s a stack to start from. The consul*dev network will need to be created before hand using swarm scope. The com.docker labels configure networking and the load balancer for accessing the Consul UI and API endpoint. You will need to update the CONSUL_LOCAL*CONFIG for the tokens. Another solution is to create a custom image based on consul:1.3.1 and create the Consul configuration using a script. This post is including the config inline for simplicity, configuring the secrets isn’t the difficult part.

The volume configuration will depend on how your Docker EE installation integrates with persistent storage.

Consul Stack

version: '3.7'

networks:
  consul_dev:
    external: true

services:
  consul-dev:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 3
      labels:
        - com.docker.ucp.access.label=/dev
        - com.docker.lb.hosts=consul.company.com
        - com.docker.lb.port=${PORT_PREFIX:-800}2
        - com.docker.lb.network=consul_dev
        - com.docker.lb.service_cluster=dev
        - com.docker.lb.ssl_cert=consul.crt
        - com.docker.lb.ssl_key=consul.key
      update_config:
        parallelism: 1
        failure_action: rollback
        delay: 120s
        monitor: 60s
        order: start-first
      rollback_config:
        parallelism: 1
        failure_action: pause
        delay: 120s
        monitor: 60s
        order: start-first
      placement:
        preferences:
          - spread: node.hostname
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.2'
          memory: 128M
      restart_policy:
        condition: any
        delay: 5s
        window: 120s
    environment:
      CONSUL_LOCAL_CONFIG: '{"disable_update_check": true, "disable_host_node_id": true, "acl_datacenter": "ho", "acl_default_policy": "deny", "acl_token": "anonymous", "acl_master_token": "6fbc4fca-c947-4c23-b42b-0a4616fd9964", "acl_agent_token": "3a1d488b-51ed-4ad2-8796-d6e5bdd43e14"}'
    volumes:
      - /etc/hostname:/tmp/hostname
      - /etc/hosts:/tmp/hosts
    command: >
      /bin/sh -c "consul agent -server
      -datacenter=dc1
      -retry-join=tasks.consul-dev:${PORT_PREFIX:-800}1
      -bootstrap-expect=3
      -advertise=$(cat /tmp/hosts | grep -v ^127[.] | cut -d ' ' -f 1 | head -n 1)
      -node=master-$(cat /tmp/hostname)
      -client=0.0.0.0
      -data-dir=/consul/data
      -config-dir=/consul/config
      -ui
      -server-port=${PORT_PREFIX:-800}0
      -serf-lan-port=${PORT_PREFIX:-800}1
      -http-port=${PORT_PREFIX:-800}2
      -dns-port=${PORT_PREFIX:-800}3
      -serf-wan-port=${PORT_PREFIX:-800}4
      -log-level=info
      -encrypt=$(cat /run/secrets/consul_encrypt_dev)
      "
    ports:
      - target: ${PORT_PREFIX:-800}0
        published: ${PORT_PREFIX:-800}0
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: udp
        mode: host
      - target: ${PORT_PREFIX:-800}2
        published: ${PORT_PREFIX:-800}2
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: udp
        mode: ingress
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: udp
        mode: host
    healthcheck:
      test: ["CMD", "/usr/bin/curl", "-f", "http://127.0.0.1:${PORT_PREFIX:-800}2/v1/status/peers"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s
    secrets:
      - consul_encrypt_dev
      - consul_master_token_dev
      - consul_agent_token_dev

  consul-dev-snapshot:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 1
      labels: [com.docker.ucp.access.label=/dev]
      resources:
        limits:
          cpus: '0.5'
          memory: 128M
        reservations:
          cpus: '0.2'
          memory: 32M
      restart_policy:
        condition: any
    command: >
      /bin/sh -c "while true;
          do
            echo Taking Consul snapshot;
            consul snapshot save -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.$(date -Iminutes).dat;
            echo Pruning old snapshots;
            find /snapshots -mtime 10 -delete -print;
            sleep 300s;
          done
      "
    volumes:
      - type: volume
        source: snapshotsdev
        target: /snapshots
    secrets:
      - consul_master_token_dev

volumes:
  snapshotsdev:
    driver: "local"
    labels:
      - com.docker.ucp.access.label=/dev

secrets:
  consul_encrypt_dev:
    external: true
  consul_master_token_dev:
    external: true
  consul_agent_token_dev:
    external: true

It’s important (and unfortunate) to note that Docker EE in swarm mode and Docker CE in swarm mode do not operate the same. For example, the tasks.consul-dev name used for service discovery does not work in Docker CE, at the time of this writing.

Enjoy your fault tolerant Consul cluster!

About the Author

Patrick Double profile.

Patrick Double

Principal Technologist

I have been coding since 6th grade, circa 1986, professionally (i.e. college graduate) since 1998 when I graduated from the University of Nebraska-Lincoln. Most of my career has been in web applications using JEE. I work the entire stack from user interface to database.   I especially like solving application security and high availability problems.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Google Professional Machine Learning Engineer Exam 2021
Exam Description A Professional Machine Learning Engineer designs, builds, and productionizes ML models to solve business challenges using Google Cloud technologies and knowledge of proven ML models and techniques. The ML Engineer is proficient in all aspects […]
Designing Kubernetes Controllers
There has been some excellent online discussion lately around Kubernetes controllers, highlighted by an excellent Speakerdeck presentation assembled by Tim Hockin. What I’d like to do in this post is explore some of the implications […]
React Server Components
The React Team recently announced new work they are doing on React Server Components, a new way of rendering React components. The goal is to create smaller bundle sizes, speed up render time, and prevent […]
Jolt custom java transform
Jolt is a JSON to JSON transformation library where the transform is defined in JSON. It’s really good at reorganizing the json data and massaging it into the output JSON you need. Sometimes, you just […]