Consul on Docker EE

“Consul is a service networking solution to connect and secure services across any runtime platform and public or private cloud” – https://consul.io

This post details a way to run a Consul cluster on Docker EE in Swarm mode. I found it difficult to configure and want to share my solution to help others. The discussion will use Consul 1.3.1 and is not intended to be a complete guide. We’ll also consider how to handle multiple clusters for different environments.

Networking

The networking of Consul is the most complicated part. Consul uses several ports, TCP and UDP, some ports can use a load balancer and others can not.

Docker networking requires us to declare the ports we use and how to expose them. This necessitates understanding ingress networking and host mode networking.

Ingress networking exposes a port across all workers in the swarm and forwards connections to one of the containers. For example, if there are 20 workers in the swarm and 2 containers, all 20 workers will listen on the ingress port and will forward to a container, usually chosen by round-robin. A consequence of this is that only one service can use a given ingress port. If you want multiple Consul clusters in a swarm, each cluster will need to specify a unique set of ports. Typically there is a DNS entry or load balancer that points to all worker nodes that can be used to reach the service.

Host mode networking exposes a port only on the worker nodes running the container(s) for the service. Unless ingress networking, the host name or IP address of the worker node must be targeted to reach the service. This may seem undesirable, but some Consul ports do not work with load balancing, such as the serf ports.

Ports, from https://www.consul.io/docs/install/ports.html

Name Purpose Port TCP/UDP Mode
server Server TCP 8300 TCP,UDP host
LAN Serf LAN 8301 TCP,UDP host
WAN Serf WAN 8302 TCP,UDP host
HTTP HTTP API 8500 TCP ingress
DNS DNS Resolution 8600 TCP,UDP ingress

Multiple Environments

Supporting multiple clusters, for example to segment between dev, test and production environments, requires configuring unique sets of ports. I also recommend different encryption keys for the gossip protocol. In case of pointing to the wrong ports, the encryption key will prevent agents from connecting.

Snapshots

Snapshots of Consul data is necessary as a backup strategy. It is especially important when running in the configuration shown here in which each agent’s data is in the container filesystem, which means it is ephemeral. In case too many containers are replaced to retain quorum, the snapshot can be used to get the cluster running.

The stack in this post takes snapshots at 5 minute intervals and keeps them for 10 days. To keep things simple, the container command for consul-dev-snapshot is a sh script. A separate container image could be made using cron.

The restore procedure requires exec’ing into the consul-dev-snapshot container and then issuing the consul snapshot restore command such as the following:

$ consul snapshot restore -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.123456789.dat

Ensure your persistent volume solution has the resiliency you need.

Stack

Here’s a stack to start from. The consul*dev network will need to be created before hand using swarm scope. The com.docker labels configure networking and the load balancer for accessing the Consul UI and API endpoint. You will need to update the CONSUL_LOCAL*CONFIG for the tokens. Another solution is to create a custom image based on consul:1.3.1 and create the Consul configuration using a script. This post is including the config inline for simplicity, configuring the secrets isn’t the difficult part.

The volume configuration will depend on how your Docker EE installation integrates with persistent storage.

Consul Stack

version: '3.7'

networks:
  consul_dev:
    external: true

services:
  consul-dev:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 3
      labels:
        - com.docker.ucp.access.label=/dev
        - com.docker.lb.hosts=consul.company.com
        - com.docker.lb.port=${PORT_PREFIX:-800}2
        - com.docker.lb.network=consul_dev
        - com.docker.lb.service_cluster=dev
        - com.docker.lb.ssl_cert=consul.crt
        - com.docker.lb.ssl_key=consul.key
      update_config:
        parallelism: 1
        failure_action: rollback
        delay: 120s
        monitor: 60s
        order: start-first
      rollback_config:
        parallelism: 1
        failure_action: pause
        delay: 120s
        monitor: 60s
        order: start-first
      placement:
        preferences:
          - spread: node.hostname
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.2'
          memory: 128M
      restart_policy:
        condition: any
        delay: 5s
        window: 120s
    environment:
      CONSUL_LOCAL_CONFIG: '{"disable_update_check": true, "disable_host_node_id": true, "acl_datacenter": "ho", "acl_default_policy": "deny", "acl_token": "anonymous", "acl_master_token": "6fbc4fca-c947-4c23-b42b-0a4616fd9964", "acl_agent_token": "3a1d488b-51ed-4ad2-8796-d6e5bdd43e14"}'
    volumes:
      - /etc/hostname:/tmp/hostname
      - /etc/hosts:/tmp/hosts
    command: >
      /bin/sh -c "consul agent -server
      -datacenter=dc1
      -retry-join=tasks.consul-dev:${PORT_PREFIX:-800}1
      -bootstrap-expect=3
      -advertise=$(cat /tmp/hosts | grep -v ^127[.] | cut -d ' ' -f 1 | head -n 1)
      -node=master-$(cat /tmp/hostname)
      -client=0.0.0.0
      -data-dir=/consul/data
      -config-dir=/consul/config
      -ui
      -server-port=${PORT_PREFIX:-800}0
      -serf-lan-port=${PORT_PREFIX:-800}1
      -http-port=${PORT_PREFIX:-800}2
      -dns-port=${PORT_PREFIX:-800}3
      -serf-wan-port=${PORT_PREFIX:-800}4
      -log-level=info
      -encrypt=$(cat /run/secrets/consul_encrypt_dev)
      "
    ports:
      - target: ${PORT_PREFIX:-800}0
        published: ${PORT_PREFIX:-800}0
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: udp
        mode: host
      - target: ${PORT_PREFIX:-800}2
        published: ${PORT_PREFIX:-800}2
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: udp
        mode: ingress
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: udp
        mode: host
    healthcheck:
      test: ["CMD", "/usr/bin/curl", "-f", "http://127.0.0.1:${PORT_PREFIX:-800}2/v1/status/peers"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s
    secrets:
      - consul_encrypt_dev
      - consul_master_token_dev
      - consul_agent_token_dev

  consul-dev-snapshot:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 1
      labels: [com.docker.ucp.access.label=/dev]
      resources:
        limits:
          cpus: '0.5'
          memory: 128M
        reservations:
          cpus: '0.2'
          memory: 32M
      restart_policy:
        condition: any
    command: >
      /bin/sh -c "while true;
          do
            echo Taking Consul snapshot;
            consul snapshot save -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.$(date -Iminutes).dat;
            echo Pruning old snapshots;
            find /snapshots -mtime 10 -delete -print;
            sleep 300s;
          done
      "
    volumes:
      - type: volume
        source: snapshotsdev
        target: /snapshots
    secrets:
      - consul_master_token_dev

volumes:
  snapshotsdev:
    driver: "local"
    labels:
      - com.docker.ucp.access.label=/dev

secrets:
  consul_encrypt_dev:
    external: true
  consul_master_token_dev:
    external: true
  consul_agent_token_dev:
    external: true

It’s important (and unfortunate) to note that Docker EE in swarm mode and Docker CE in swarm mode do not operate the same. For example, the tasks.consul-dev name used for service discovery does not work in Docker CE, at the time of this writing.

Enjoy your fault tolerant Consul cluster!

About the Author

Patrick Double profile.

Patrick Double

Principal Technologist

I have been coding since 6th grade, circa 1986, professionally (i.e. college graduate) since 1998 when I graduated from the University of Nebraska-Lincoln. Most of my career has been in web applications using JEE. I work the entire stack from user interface to database.   I especially like solving application security and high availability problems.

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Blog Posts
Seamlessly Integrating Micro Apps with iFrame
A recent client wanted to upgrade a small portion of their legacy application with a more modern UI and extra functionality, like fuzzy text search. There are a few approaches to incremental upgrades of legacy […]
Passing the AWS Machine Learning Speciality Exam
The Amazon Machine Learning Specialty Exam is a 3-hour, 65 question test. It is designed to test your skills in AWS specific Data Engineering and Machine Learning Practices along with Machine Learning in general. I […]
Consistent Python environments with Poetry and pre-commit hooks
Clean and Consistent Environments Regardless of the programming language you are working in, it can sometimes be a struggle to maintain a clean codebase and a consistent development environment for all members of your team, […]
Concurrently Process a Single Kafka Partition
Concurrency in Kafka is defined by how many partitions make up a topic. For a consumer group, there can be as many consumers as there are partitions, with each consumer being assigned one or more […]