It's a well-known fact that deploying and managing anything at scale is hard. Docker is no different. However, the engineers at Docker recognize this and are working on three products to help: Docker Machine, Docker Compose, and Docker Swarm.
At the highest level, Machine makes it easy to spin up Docker hosts in your environment, Compose makes it easier to deploy complex distributed apps on Docker, and Swarm enables native clustering for Docker.
All of these are new technologies and currently under development at Docker Inc. All are pluggable, meaning they're optional and can be swapped out in favor of third-party products that do the same job. This follows the Docker philosophy of "batteries included but removable." If you find a third-party tool that works better for you, you're free to use it to drive Docker instead.
In this article, we'll take a quick look at Docker Swarm.
What is Docker Swarm?
If Swarm is native clustering for Docker, we need to define "clustering." In the context of Swarm, a cluster is a pool of Docker hosts that acts a bit like a single large Docker host. Instead of the hassle of having to decide which host to start every container on, we can tell Swarm to start our containers. In the background, Swarm magically decides which nodes to start them on. This is a great feature and a core tenet of cloud computing.
But for readers familiar with conventional fail-over clustering, Swarm is not that -- yet.
At the time of writing (version 0.2), Swarm does not support container fail-over. This means that when a node in the cluster fails, the containers it was hosting will not be restarted on another node.
This is probably the biggest weakness in the current version of Swarm. Now that you know what a Swarm cluster effectively is, let's look at some of the good things it currently offers.
Swarm is easy
There's a lot to like about Swarm. First and foremost, it's easy. Compared to clustering technologies I've worked with in the past, Swarm is an absolute walk in the park. Building a Swarm cluster is basically the following three simple steps:
Each of the above steps is done with a single command: one command to create the cluster definition, another to add hosts to the cluster, and another to create a cluster manager. That's it!
Note that the command to join hosts to the cluster has to be run on each host joining the cluster. Once hosts are joined to the cluster, they are referred to as nodes.
You already know Swarm
Once your Swarm cluster is up and running, you'll feel right at home with it. Swarm clusters look, smell, and feel like normal Docker. If you know Docker, you know Swarm. This is because Swarm is mostly compatible with the Docker Remote API.
Basically, every Docker host runs a client and a daemon process. The daemon process implements the Docker Remote API endpoints, and the client talks to the daemon via HTTP-based API calls -- via commands like docker run, docker ps, docker info, and the rest. Because Swarm implements most of those API endpoints, most of the regular old Docker commands you already know still work.
Obviously there will be slight differences. For example, rather than returning information specific to a single Docker host, the docker info and docker ps commands return info related to the entire cluster when run within a Swarm cluster.
This is great. The learning curve is so small it's barely worth calling a curve.
Scheduling is simple
Launching containers in a Swarm cluster tends to be known as scheduling. Swarm currently has three algorithms to help it decide which nodes in the cluster on which to schedule new containers: Spread, BinPack, and Random.
Spread is the default. It tries to balance containers evenly across all nodes in the cluster. To do so, it takes into account each node's available CPU and RAM, as well as the number of containers it's already running.
BinPack is the opposite of spread. It works by scheduling all containers on a single node until that node is fully utilized. Then it starts scheduling containers on the next node in the cluster. A major goal of BinPack is to use as little infrastructure as possible -- great if you're paying for instances in the cloud. It gets its name from the fact that its modus operandi is similar to how we fill bins (trash cans): fill one to the top before starting to fill the next.
Random is, well, random.
Scheduling is powerful
One of my favorite Swam features is Constraints. Swarm lets you to tag nodes in the cluster with our own custom labels. We can then use these labels to constrain which nodes a container can start on.
For example, you can label nodes according to geographic location such as "London," "NYC," and "Singapore." But it doesn't stop there. Each node can be tagged with multiple labels. You could keep going and tag nodes according to the zone they're deployed in, such as "production" or "development." We might even add another label for platform: "Azure," "AWS," "on-prem," and so on.
Leveraging these labels, we can easily schedule containers to only start on nodes tagged as "London," "production," and "on-prem," all via the usual docker run command.
Labels are insanely simple, but massively powerful.
What's missing in Swarm
OK, Swarm is awesome, but it's by no means the finished article. Here I'll point out what I think are the most important features yet to be added.
We've already said that Swarm doesn't support autorestart of containers (yet). This is a major feature gap, but one I expect to be plugged very soon.
There's also no HA (high availability) for the Swarm Manager process (yet). I wouldn't be surprised to see HA for the Swarm Manager by the time Swarm is released as version 1.0.
Integration with networking and storage is understandably limited. Due to the fact that networking and storage are two core features of Docker that are relatively immature and still under heavy development, I expect Swarm integration with these to take longer to arrive.
Currently all nodes in a Swarm cluster need to run the same version of Docker -- which will no doubt pose challenges when it comes time to upgrade your cluster to the latest version of Docker.
Removing the batteries
The time and effort that Docker Inc. is putting into Swarm and other orchestration components such as Machine and Compose tell me two things: The company recognizes the challenges of containers at scale, and it's serious about making Docker a platform for large enterprises.
However, Swarm is not without competitors. Products such as Mesosphere are more mature and more suitable for large-scale deployments, and it can easily replace Swarm. While this fits nicely with the philosophy of "batteries included but removable," this could potentially become tricky for Docker Inc. On the one hand, Docker Inc. wants a strong partner ecosystem. On the other hand, it needs to make money.
Drawing on the example of a virtualization giant: VMware is all-guns-blazing committed to delivering customers the entire stack -- often at the expense of its third-party ecosystem. Docker Inc. may find it hard not to follow suit as turning a profit becomes increasingly important.
One final point -- something of a personal plea: The lack of sys admins and IT operations people involved with Docker is a concern. Docker looks set to become a fundamental component of IT infrastructure, and we're staring down the barrel of a huge shortage in sys admins that grok it. The last thing we need is core infrastructure like Docker placed solely in the hands of developers, who, as good as they may be, do not have the experience of running production infrastructure.
To sys admins out there: Get yourself involved with any deployment of Docker within your organization before it's too late. In a traditional environment, developers should own what's inside the container, and operations should own what's outside.