Consul on Docker EE

“Consul is a service networking solution to connect and secure services across any runtime platform and public or private cloud” – https://consul.io

This post details a way to run a Consul cluster on Docker EE in Swarm mode. I found it difficult to configure and want to share my solution to help others. The discussion will use Consul 1.3.1 and is not intended to be a complete guide. We’ll also consider how to handle multiple clusters for different environments.

Networking

The networking of Consul is the most complicated part. Consul uses several ports, TCP and UDP, some ports can use a load balancer and others can not.

Docker networking requires us to declare the ports we use and how to expose them. This necessitates understanding ingress networking and host mode networking.

Ingress networking exposes a port across all workers in the swarm and forwards connections to one of the containers. For example, if there are 20 workers in the swarm and 2 containers, all 20 workers will listen on the ingress port and will forward to a container, usually chosen by round-robin. A consequence of this is that only one service can use a given ingress port. If you want multiple Consul clusters in a swarm, each cluster will need to specify a unique set of ports. Typically there is a DNS entry or load balancer that points to all worker nodes that can be used to reach the service.

Host mode networking exposes a port only on the worker nodes running the container(s) for the service. Unless ingress networking, the host name or IP address of the worker node must be targeted to reach the service. This may seem undesirable, but some Consul ports do not work with load balancing, such as the serf ports.

Ports, from https://www.consul.io/docs/install/ports.html

Name Purpose Port TCP/UDP Mode
server Server TCP 8300 TCP,UDP host
LAN Serf LAN 8301 TCP,UDP host
WAN Serf WAN 8302 TCP,UDP host
HTTP HTTP API 8500 TCP ingress
DNS DNS Resolution 8600 TCP,UDP ingress

Multiple Environments

Supporting multiple clusters, for example to segment between dev, test and production environments, requires configuring unique sets of ports. I also recommend different encryption keys for the gossip protocol. In case of pointing to the wrong ports, the encryption key will prevent agents from connecting.

Snapshots

Snapshots of Consul data is necessary as a backup strategy. It is especially important when running in the configuration shown here in which each agent’s data is in the container filesystem, which means it is ephemeral. In case too many containers are replaced to retain quorum, the snapshot can be used to get the cluster running.

The stack in this post takes snapshots at 5 minute intervals and keeps them for 10 days. To keep things simple, the container command for consul-dev-snapshot is a sh script. A separate container image could be made using cron.

The restore procedure requires exec’ing into the consul-dev-snapshot container and then issuing the consul snapshot restore command such as the following:

$ consul snapshot restore -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.123456789.dat

Ensure your persistent volume solution has the resiliency you need.

Stack

Here’s a stack to start from. The consul*dev network will need to be created before hand using swarm scope. The com.docker labels configure networking and the load balancer for accessing the Consul UI and API endpoint. You will need to update the CONSUL_LOCAL*CONFIG for the tokens. Another solution is to create a custom image based on consul:1.3.1 and create the Consul configuration using a script. This post is including the config inline for simplicity, configuring the secrets isn’t the difficult part.

The volume configuration will depend on how your Docker EE installation integrates with persistent storage.

Consul Stack

version: '3.7'

networks:
  consul_dev:
    external: true

services:
  consul-dev:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 3
      labels:
        - com.docker.ucp.access.label=/dev
        - com.docker.lb.hosts=consul.company.com
        - com.docker.lb.port=${PORT_PREFIX:-800}2
        - com.docker.lb.network=consul_dev
        - com.docker.lb.service_cluster=dev
        - com.docker.lb.ssl_cert=consul.crt
        - com.docker.lb.ssl_key=consul.key
      update_config:
        parallelism: 1
        failure_action: rollback
        delay: 120s
        monitor: 60s
        order: start-first
      rollback_config:
        parallelism: 1
        failure_action: pause
        delay: 120s
        monitor: 60s
        order: start-first
      placement:
        preferences:
          - spread: node.hostname
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.2'
          memory: 128M
      restart_policy:
        condition: any
        delay: 5s
        window: 120s
    environment:
      CONSUL_LOCAL_CONFIG: '{"disable_update_check": true, "disable_host_node_id": true, "acl_datacenter": "ho", "acl_default_policy": "deny", "acl_token": "anonymous", "acl_master_token": "6fbc4fca-c947-4c23-b42b-0a4616fd9964", "acl_agent_token": "3a1d488b-51ed-4ad2-8796-d6e5bdd43e14"}'
    volumes:
      - /etc/hostname:/tmp/hostname
      - /etc/hosts:/tmp/hosts
    command: >
      /bin/sh -c "consul agent -server
      -datacenter=dc1
      -retry-join=tasks.consul-dev:${PORT_PREFIX:-800}1
      -bootstrap-expect=3
      -advertise=$(cat /tmp/hosts | grep -v ^127[.] | cut -d ' ' -f 1 | head -n 1)
      -node=master-$(cat /tmp/hostname)
      -client=0.0.0.0
      -data-dir=/consul/data
      -config-dir=/consul/config
      -ui
      -server-port=${PORT_PREFIX:-800}0
      -serf-lan-port=${PORT_PREFIX:-800}1
      -http-port=${PORT_PREFIX:-800}2
      -dns-port=${PORT_PREFIX:-800}3
      -serf-wan-port=${PORT_PREFIX:-800}4
      -log-level=info
      -encrypt=$(cat /run/secrets/consul_encrypt_dev)
      "
    ports:
      - target: ${PORT_PREFIX:-800}0
        published: ${PORT_PREFIX:-800}0
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}1
        published: ${PORT_PREFIX:-800}1
        protocol: udp
        mode: host
      - target: ${PORT_PREFIX:-800}2
        published: ${PORT_PREFIX:-800}2
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: tcp
        mode: ingress
      - target: ${PORT_PREFIX:-800}3
        published: ${PORT_PREFIX:-800}3
        protocol: udp
        mode: ingress
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: tcp
        mode: host
      - target: ${PORT_PREFIX:-800}4
        published: ${PORT_PREFIX:-800}4
        protocol: udp
        mode: host
    healthcheck:
      test: ["CMD", "/usr/bin/curl", "-f", "http://127.0.0.1:${PORT_PREFIX:-800}2/v1/status/peers"]
      interval: 1m30s
      timeout: 10s
      retries: 3
      start_period: 40s
    secrets:
      - consul_encrypt_dev
      - consul_master_token_dev
      - consul_agent_token_dev

  consul-dev-snapshot:
    image: "consul:1.3.1"
    networks:
      consul_dev: {}
    deploy:
      replicas: 1
      labels: [com.docker.ucp.access.label=/dev]
      resources:
        limits:
          cpus: '0.5'
          memory: 128M
        reservations:
          cpus: '0.2'
          memory: 32M
      restart_policy:
        condition: any
    command: >
      /bin/sh -c "while true;
          do
            echo Taking Consul snapshot;
            consul snapshot save -token=$(cat /run/secrets/consul_master_token_dev) -http-addr=docker-app.company.com:${PORT_PREFIX:-800}2 /snapshots/consul.$(date -Iminutes).dat;
            echo Pruning old snapshots;
            find /snapshots -mtime 10 -delete -print;
            sleep 300s;
          done
      "
    volumes:
      - type: volume
        source: snapshotsdev
        target: /snapshots
    secrets:
      - consul_master_token_dev

volumes:
  snapshotsdev:
    driver: "local"
    labels:
      - com.docker.ucp.access.label=/dev

secrets:
  consul_encrypt_dev:
    external: true
  consul_master_token_dev:
    external: true
  consul_agent_token_dev:
    external: true

It’s important (and unfortunate) to note that Docker EE in swarm mode and Docker CE in swarm mode do not operate the same. For example, the tasks.consul-dev name used for service discovery does not work in Docker CE, at the time of this writing.

Enjoy your fault tolerant Consul cluster!

About the Author

Patrick Double profile.

Patrick Double

Principal Technologist

I have been coding since 6th grade, circa 1986, professionally (i.e. college graduate) since 1998 when I graduated from the University of Nebraska-Lincoln. Most of my career has been in web applications using JEE. I work the entire stack from user interface to database.   I especially like solving application security and high availability problems.

Leave a Reply

Your email address will not be published.

Related Blog Posts
Natively Compiled Java on Google App Engine
Google App Engine is a platform-as-a-service product that is marketed as a way to get your applications into the cloud without necessarily knowing all of the infrastructure bits and pieces to do so. Google App […]
Building Better Data Visualization Experiences: Part 2 of 2
If you don't have a Ph.D. in data science, the raw data might be difficult to comprehend. This is where data visualization comes in.
Unleashing Feature Flags onto Kafka Consumers
Feature flags are a tool to strategically enable or disable functionality at runtime. They are often used to drive different user experiences but can also be useful in real-time data systems. In this post, we’ll […]
A security model for developers
Software security is more important than ever, but developing secure applications is more confusing than ever. TLS, mTLS, RBAC, SAML, OAUTH, OWASP, GDPR, SASL, RSA, JWT, cookie, attack vector, DDoS, firewall, VPN, security groups, exploit, […]