Cluster Control with Etcd and Docker

etcdAdding and removing hosts for live multi-server sites is a lot easier with technology such as Docker and etcd.

Docker containers make it easy to create a template for a new server that you want to spin up, such as a new web server, but you have to connect all the servers together by configuration files when you start up the containers. This is especially problematic when you have an existing running cluster and want to add say another web server instance. You don’t want to take the whole site down just to add one node.

An better approach is where whenever a new server starts up, it registers itself in a service registry such as etcd as shipped with CoreOS. Other interested servers can watch for newly created services and adjust their own configuration to point to the new service that just started. Similarly, they can watch for services that exit to remove them automatically as well.

In this post I describe a set of “Cluster Control” command line scripts written in PHP that implement this functionality on top of etcd. Each container runs several scripts to watch for new entries in etcd, send heartbeat updates to etcd, and shutdown the current server if the server’s entry is removed from etcd. The code can be found on GitHub in https://github.com/alankent/cluster-control. At present it is more of a proof of concept to see what is possible, but so far I am liking the results.

Disclaimer: The opinions in this post are my own and not that of my employer.

Scenario

Consider a example situation where there is

  • A single load balancer pointing to a set of web servers
  • Each web server points to a set of database servers (for load balancing and high availability – if any database server goes down, a different database can be used)
  • A set of database servers

LB-WS-DB

For the sake of this post, I have implemented all three servers using Apache servers with a single “index.php” script. (The server definitions can be found under the demo directory in the repository.)  A later project will be to add support for Magento with other containers such as Redis caches.

The goal is then to support the following use cases:

  • A container is added to an existing site simplify by starting up a new container (e.g. adding a web server will automatically be added to load balancer)
  • A container can be shut down and removed by deleting the server’s entry in etcd (e.g. removing a database server will automatically deregister the database server from all web servers)
  • A container that crashes or is is killed will automatically be removed from the cluster (after a period with no heartbeat, the server will be automatically removed)

Please note I do not tackle the harder problem of guaranteeing all operations will success even if a node crashes. My goal here is the simpler problem of having the plumbing for a cluster growing easily, shrinking easily, and recovering without human assistance if a server goes down unexpectedly (even though errors may occur for a short period of time).

The Building Blocks

There are a number of PHP scripts that run behind the scenes. The scripts are as follows. (I go into how to use them all together below.) All of the commands accept a –verbose command line option for printing out more details (such as etcd calls and responses).  The commands also all accept a –conf argument to specify the filename to use, defaulting to cluster-control.conf in the current directory.

A sample configuration file is as follows:

{
  "etcd": {
    "server": "http://{etcd-hostname}:4001/"
  },
  "self": {
    "key": "/clusterdemo/webserver/{public-host-and-port}",
    "ttl": 10,
    "heartbeat": 5
  },
  "clusters": [
    {
      "name": "dbserver",
      "path": "/clusterdemo/dbserver",
      "handler": "AlanKent\\ClusterControl\\Handlers\\JsonHandler",
      "handlerConfig": "html/dbservers.json"
    }
  ]
}

The configuration file is as follows.

  • “server” is the base URL for connecting to etcd.
  • “key” under “self” is the etcd key that this container will set to indicate it is alive. (Key names look like URL paths allowing key namespacing using the directory structure.)
  • “ttl” is time to live in seconds for key values stored in etcd. If a container goes down, the entry in etcd will last for at most this length of time. Setting it too small may generate a lot of entries in etcd; setting too large may take a while to recover from availability problems.
  • “heartbeat” is the duration in seconds of how long to wait between sending update messages. The value of “heartbeat” should always be less than “ttl” or else the heartbeat will expire, which will in turn cause the web server to shut down.
  • The “clusters” group allows multiple groups of child servers to be defined. This include the name of the cluster, the directory under which the etcd key for this node will be created, and handler code for writing the new configuration details. (A Varnish handler in the future could write out a configuration file in the format Varnish can load directly.)

The supported commands are as follows.

$ bin/cluster-control cc:heartbeat [--once]

This command reads the configuration file and starts setting the “self key” with a time to live value. While things are all good and healthy, the node will keep generating periodic updates to the container’s key. The TTL for the key and the interval between setting of the key are read from the configuration file. The code first uses a “set key” API, then uses a “set key value but only if key already exists” etcd API. The latter means if the key is removed for any reason, the heartbeat will stop and the command will exit. This allows an administrator to shut down a server by simply removing its entry from etcd.

There is also a –once command line option that will generate a single heartbeat and exit. This is useful to guarantee a single heartbeat occurs before starting the cc:watchkey command below.

$ bin/cluster-control cc:removekey

This command removes the key for the current container. This script is typically called after the current server exits to guarantee to update etcd that the container is no longer available.

$ bin/cluster-control cc:watchkey

This command blocks until the key is deleted. If followed by a command to shut down the server, this can be used to quickly detect when someone outside the container removes the key from etcd, and thus cause the container to shutdown. Without this command, the container would wait until the next heartbeat expires until it shuts down.

$ bin/cluster-control cc:clusterprepare {cluster}

This command should be invoked once per cluster the server is interested in when the server is started up. A server may watch multiple clusters (e.g. the web server may watch a cluster of database servers and a cluster of Redis servers).  A separate command would be run per cluster. This command returns the etcd “watch index”, an integer that should be later passed to the cc:clusterwatch command. Using this index ensures that no updates are missed as normally the prepare command is run before the server is started, and the cc:clusterwatch command is started after the server is up and running. See the etcd documentation for more details.

$ bin/cluster-control cc:clusterwatch {cluster} {index} {command and args} ...

This command watches for any changes in the specified cluster (starting at the specified watch index returned from the cc:clusterprepare command). When a change is spotted, a file on disk listing the container members is written, and the supplied command run (typically to send a signal to the server so it knows to reload configuration settings from disk). For a web server, apachectl graceful might be used to do a graceful restart (let current connections finish first).

Start Up Sequence

There is an example start up script using the above commands for each of the demo servers in the GitHub repository. There are slight modifications between the scripts to deal with different clusters to be watched. The main outline of commands is as follows (this is based on the demo/webserver/startserver script).

CLUSTERCONTROL="bin/cluster-control"
DBSERVER_INDEX=$($CLUSTERCONTROL cc:clusterprepare dbserver)
(
  PUBLIC_HOST_SLASH_PORT=$(echo $PUBLIC_HOST_AND_PORT | tr : /)
  echo 'Waiting for web server to accept connections...'
  while ! timeout 1 bash -c "cat < /dev/null > /dev/tcp/$PUBLIC_HOST_SLASH_PORT"; do
    echo 'Still waiting...'
  done
  echo 'Web server is up!'
  $CLUSTERCONTROL cc:heartbeat --once
  ($CLUSTERCONTROL cc:watchkey ; apachectl graceful-stop) &
  ($CLUSTERCONTROL cc:heartbeat ; apachectl graceful-stop ; sleep 5 ; apachectl stop) &
  ($CLUSTERCONTROL cc:clusterwatch dbserver $DBSERVER_INDEX apachectl graceful) & 
)&
/usr/sbin/apache2 -D FOREGROUND
$CLUSTERCONTROL cc:removekey
exit 0

The script runs in foreground mode to guarantee that if it exits for any reason (e.g. a core dump), the container would guarantee to exit (rather than hang around). (Another script for example could watch the number of running web servers in etcd and start a new instance up automatically if required.) So I decided not to run the web server in background. To get all the other scripts to start running after the web server was ready to receive HTTP requests, I used the while loop to wait until it could connect to the web server. After that all the auxiliary commands can be run.

The following describes each step in more detail:

  • The first line saves the executable away in a variable. Arguments of –conf or –verbose may be added here as well.
  • The cc:clusterprepare command writes a fresh database server list file based on what is currently in etcd. This is guaranteed to complete before the Apache web server is started.
  • The big nested parenthesis group is a series of commands that need to be run after the web server is up. At the start is a while loop that loops until it succeeds in connecting to the web server.
  • The cc:heartbeat –once command sends a single heartbeat, guaranteeing the key to be set before the following cc:watchkey runs (so it does not mistakenly think a request to shut down the server has come before the server has really even started).
  • The cc:watchkey watches to see if someone external removes the key value from etcd. If so, the current web server is immediately shut down. This is optional (it would be detected by the missing heartbeat eventually) but it helps the cluster respond quickly allowing a longer delay between heartbeats. (The heartbeats are there for emergency situations only.)
  • The cc:heartbeat command updates the container’s key periodically with a value that will time out. If something goes wrong (e.g. the host crashes), the key value will expire in etcd and it will thus automatically be removed from other containers. If the heartbeat stops, the server is shutdown. Because this is more of a last resort stop, it does a graceful stop, waits a little while, then does a hard stop.
  • The next command is a background task to watch for changes in a cluster (have any new hosts joined, or any existing hosts left?), triggering a graceful Apache server restart upon change. It uses the return value of cc:preparecluster to make sure it does not miss any events while the server was starting up.
  • The Apache2 web server is then started in foreground mode – it does not move on to the next command until the web server exits.
  • If the web server exits, it is followed by a cc:removekey command to remove the key if the web server exits for any reason. This will tell other containers that this container is existing immediately. (The timeout of values a short time later would remove it without special action being required.) Because the other servers are listening to etcd updates, they should quickly remove the server from their local configuration.
  • The final command is exit 0 to make sure the cc:removekey command error status is not returned as the shell scripts value.

A bit complicated I know, but the end result is a solution that can deal with a range of problems that can occur.

Installation

If you want to get the scripts going in a demo, you will first need to get a CoreOS server up and running.  (It does not strictly need to be CoreOS, but it comes with Docker and etcd out of the box. I found it easier that way.)

If it is not already running, start up an etcd instance.

$ etcdctl ls /
Error: Cannot sync with the cluster using peers 127.0.0.1:4001
$ /usr/bin/etcd &
...lots of output...
$ etcdctl mkdir /foo
$ etcdctl ls /
/foo
$ etcdctl rmdir /foo

In production you would most likely have an etcd instance on every host with a single load balancer, web server, and database server container running on that host. This avoids external network problems stopping the scripts talking to the local etcd server. (The etcd server can worry about recovering from network problems.)

The supplied sample demo script and examples create many light weight mock servers on the one host to avoid the complexities of setting up a cluster of etcd servers for the purposes of this blog. It is important however to note that no change are required to make this operate across multiple hosts other than to configure the etcd servers to talk to each other.

Once etcd is running, the following are some sample commands you can run.

$ etcdctl ls /
$ etcdctl --debug ls /
Cluster-Peers: http://127.0.0.1:4001
Curl-Example: curl -X GET http://127.0.0.1:4001/v2/keys/?consistent=true&recursive=false&sorted=false
$ etcdctl mkdir /foo
$ etcdctl set /foo/bar "test"
$ etcdctl ls /foo
/foo/bar
$ etcdctl get /foo/bar
test
$ etcdctl rm /foo/bar
$ etcdctl rmdir /foo

Next, on your CoreOS host get a copy of the Cluster Control library from GitHub.

$ git clone https://github.com/alankent/cluster-control.git

You can go into the demo directory and run the setup-etcd shell script from that directory or follow through the rest of this blog post line by line. You will need to change the “public IP” address of the current host in the setup script. ifconfig can be used to find the IP address. The setup-etcd script can be run multiple times if desired (although there may be some error messages about things that already exist). I am going to go through the commands step by step here and explain what they are all for. (The setup-etcd may different slightly from the following.) If just want to get it up and going, running the setup-etcd script is a good option.

$ ETCD_URL=http://172.17.42.1:4001/
$ PUBLIC_IP=10.64.255.235

Note these do not need to be environment variables – they are just to make the following commands easier to adjust for different IP addresses.

On my host the PUBLIC_IP is the IP address that the server can be reached from externally. This is important if you really do have multiple hosts involved. The ETCD_URL could use the public IP, but it is only used to connect to the local etcd server. You can see where I got these numbers from using the following ifconfig command. (Look for ‘inet’ below.)

$ ifconfig | less
docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
 inet 172.17.42.1 netmask 255.255.0.0 broadcast 0.0.0.0
 inet6 fe80::5484:7aff:fefe:9799 prefixlen 64 scopeid 0x20<link>
 ether 56:84:7a:fe:97:99 txqueuelen 0 (Ethernet)
 RX packets 16122361 bytes 976161982 (930.9 MiB)
 RX errors 0 dropped 0 overruns 0 frame 0
 TX packets 37320506 bytes 58202222682 (54.2 GiB)
 TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

ens3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
 inet 10.64.255.235 netmask 255.255.255.0 broadcast 10.64.255.255
 inet6 fe80::76db:d1ff:fe80:ee51 prefixlen 64 scopeid 0x20<link>
 ether 74:db:d1:80:ee:51 txqueuelen 1000 (Ethernet)
 RX packets 13133729 bytes 3402464957 (3.1 GiB)
 RX errors 0 dropped 0 overruns 0 frame 0
 TX packets 11043394 bytes 1895244186 (1.7 GiB)
 TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

...

Next we need to set up some key namespace directories in etcd. You only need to do this once.

$ etcdctl mkdir /clusterdemo
$ etcdctl mkdir /clusterdemo/loadbalancer
$ etcdctl mkdir /clusterdemo/webserver
$ etcdctl mkdir /clusterdemo/dbserver
$ etcdctl ls /clusterdemo
/clusterdemo/loadbalancer
/clusterdemo/webserver
/clusterdemo/dbserver

Docker images need to be created for the demos. (I may push these up to Docker hub at some stage if useful.)

$ cd demo/loadbalancer
$ docker build -t cluster-control-demo-loadbalancer .
$ cd ../webserver
$ docker build -t cluster-control-demo-webserver .
$ cd ../dbserver
$ docker build -t cluster-control-demo-dbserver .

Now the images are built, we can spin up a few instances to get a first cluster going.

$ cd ../loadbalancer
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8100 \
     -p 8100:80 cluster-control-demo-loadbalancer
$ cd ../webserver
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8110 \
     -p 8110:80 cluster-control-demo-webserver
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8111 \
     -p 8111:80 cluster-control-demo-webserver
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8112 \
     -p 8112:80 cluster-control-demo-webserver
$ cd ../dbserver
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8120 \
     -p 8120:80 cluster-control-demo-dbserver
$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8121 \
     -p 8121:80 cluster-control-demo-dbserver

Note that none of the configuration mentioned the hosts and ports of other servers. They were only told their own IP address and port numbers. These services can be seen in etcd using the following commands.

$ etcdctl ls /clusterdemo/loadbalancer
/clusterdemo/loadbalancer/10.64.255.235:8100
$ etcdctl ls /clusterdemo/webserver
/clusterdemo/webserver/10.64.255.235:8110
/clusterdemo/webserver/10.64.255.235:8111
/clusterdemo/webserver/10.64.255.235:8112
$ etcdctl ls /clusterdemo/dbserver
/clusterdemo/dbserver/10.64.255.235:8120
/clusterdemo/dbserver/10.64.255.235:8121

Ok, so let’s hit the server!

$ curl -s http://${PUBLIC_IP}:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8110","10.64.255.235:8111","10.64.255.235:8112"]

10.64.255.235:8110: 10.64.255.235:8120 10.64.255.235:8121
10.64.255.235:8111: 10.64.255.235:8120 10.64.255.235:8121
10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121

Done

The 3 lines above ‘Done’ are the important ones. The three web servers (8110, 8111, and 8112) each have a connection to both database servers (8120 and 8121). (On a real distributed implementation running on different servers the IP addresses would differ rather than the port numbers.)

Lets start up one more database server (8122) and see what happens.

$ docker run -d -e ETCD_URL=$ETCD_URL -e PUBLIC_HOST_AND_PORT=${PUBLIC_IP}:8122 \
     -p 8122:80 cluster-control-demo-dbserver

If you immediately check to see if the new database server is registered you might not find it.

$ curl -s http://10.64.255.235:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8110","10.64.255.235:8111","10.64.255.235:8112"]

10.64.255.235:8110: 10.64.255.235:8120 10.64.255.235:8121
10.64.255.235:8111: 10.64.255.235:8120 10.64.255.235:8121
10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121

Done
$ etcdctl ls /clusterdemo/dbserver
/clusterdemo/dbserver/10.64.255.235:8120
/clusterdemo/dbserver/10.64.255.235:8121

What is going on? What can happen is the new server has not quite started up yet and run all the scripts. Wait a bit longer and try again.

$ sleep 10
$ curl -s http://10.64.255.235:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8110","10.64.255.235:8111","10.64.255.235:8112"]

10.64.255.235:8110: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122
10.64.255.235:8111: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122
10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122

Done
$ etcdctl ls /clusterdemo/dbserver
/clusterdemo/dbserver/10.64.255.235:8120
/clusterdemo/dbserver/10.64.255.235:8121
/clusterdemo/dbserver/10.64.255.235:8122

As you can see, all 3 web servers now see the new database server. (If you are freakishly lucky you might hit it at just the right time where some web servers have been updated and others have not. That should be rare.)

So how to remove a web server? Rather than have to log onto the CoreOS host and kill the docker container, the cc:watchkey command can be used to detect when the entry is deleted an trigger a server shutdown. Thus to shut down a server you just remove its key.

$ etcdctl rm /clusterdemo/webserver/10.64.255.235:8110
$ curl -s http://10.64.255.235:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8111","10.64.255.235:8112"]

10.64.255.235:8111: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122
10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122

Done
$ etcdctl ls /clusterdemo/webserver
/clusterdemo/webserver/10.64.255.235:8111
/clusterdemo/webserver/10.64.255.235:8112

What if a web server dies? We can simulate that using docker kill. The following looks for the process with port 8111 and kills it.

$ docker kill $(docker ps -a | grep cluster-control-demo | grep 8111 | awk '{print $1}')

It may take a little while for cc:heartbeat to notice the server is no longer up. Requests to the web server will however fail. (This is where a good load balancer helps – if it can redirect the request to a working server so the client does not notice the failure.)

$ curl -s http://10.64.255.235:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8111","10.64.255.235:8112"]

10.64.255.235:8111: (no response)
10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122

Done
$ etcdctl ls /clusterdemo/webserver
/clusterdemo/webserver/10.64.255.235:8111
/clusterdemo/webserver/10.64.255.235:8112
$ sleep 10
$ curl -s http://10.64.255.235:8100/
Reading webservers.json...
webservers.json: ["10.64.255.235:8112"]

10.64.255.235:8112: 10.64.255.235:8120 10.64.255.235:8121 10.64.255.235:8122

Done
$ etcdctl ls /clusterdemo/webserver
/clusterdemo/webserver/10.64.255.235:8112

What is Next

Using the supplied scripts and etcd, it is fairly easy to implement a cluster where nodes can self-register themselves into the cluster, and clean up upon exit. This makes it much easier to add new hosts to an existing cluster.

The current code base can only write a JSON file – this needs to be extended to allow writing of a range of file formats such as Magento (say for read-only database replicas).

Then my next goal is get some good standard Docker definitions for MySQL, Apache 2 web server, Redis server etc containers, and have those containers use this library. That would make it trivial to add a new web server node to an existing site to ramp up capacity before a special seasonal event.

If you are interested in collaborating on the development of standard Docker container definitions (e.g. good settings for Apache2 or MySQL) specifically for Magento 1.X or Magento 2, please drop me a line or leave a comment.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: