An evolving platform’s servers should also evolve and scale over time. Small apps start with one server. If your platform grows, it expands to include several servers. If you achieve a complex, well-designed DevOps platform, you end up with a server farm. In the growth process, your servers develop from behaving like pets (cats or dogs), to acting like cattle, and finally, to behaving like an ant colony. This comparison might sound crazy, but it will make total sense when you read on.
The dog — one server –
In the beginning, you probably only have one server. It might have a self-explanatory name like “myplatform-production.” And you probably give it lots of attention like configuring all the details and making sure it works well. This server is in the pet phase. You recognize it quickly by using its name, and it needs a lot of love. If it has any problems, you’ll have to immediately check what’s wrong and try to fix it. It’s just like having a dog and being responsible for its feeding, health, and emotional needs.
Being more specific, you’ll connect to your server using SSH, pull the latest code and make sure everything is running smoothly. You can even use some PaaS (Platform as a Service) like Heroku. Maybe you want to create scripts to automate parts of this, it still requires a lot of love and oftentimes you’ll be called because the server is down (probably having run out of memory due to big log files).
Cattle — many servers
After you scale your platform a bit, you can’t treat it like a dog anymore. You’ll probably need multiple environments, and you’ll give them more generic names. You won’t call your server by its name (maybe just a meaningful label), and you’ll have to create tools to automate most of the work. It should be easier to set up a new server or upgrade an existing one. Now that you have several servers, you’ll treat them more like cattle. You won’t give them as much love, and you’ll need mechanisms and automation to take care of them. If they were cattle, you’d need a close water source, grass and grains for them to feed on, and maybe a shepherd dog. They feed and water themselves, so some work is done automatically, but you still need to perform some work and oversight. This degree of attention and love is similar to what your servers need when you reach a certain level of automation.
If you’ve reached this level, you have an automated deployment process, server health monitors and your platform will probably still work after a hardware failure (or at least recover automatically after a few minutes). Ideally, you are using containers and orchestration (Dockers, Kubernetes), you have a solid architecture with multiple components (centralized logging like ELK stack, a cloud database like Amazon RDS) and load balancers. You’ll be proud of your Continuous Integration server and your platform is pretty resilient to failure.
Ant colony — a server farm
Ultimately, your goal is to have a farm of servers that run like an ant colony. You don’t give them any love, they don’t have names that you recognize, and they can survive no matter what happens. You don’t have to worry about the resources they need because an ant colony solves its own problems. This type of efficient environment is auto-scalable and survives calamities. Creating a self-managing, coordinated platform helps guarantee the long-term operation of your system.
At this point, you manage your entire infrastructure automatically, all the way from code to production. You have been using Terraform or Ansible heavily, and any change — let’s say updating a Linux package across hundreds of servers, can be orchestrated by tools. Your application has a level of testing automation and pipeline management, that lead you to a CI/CD environment which automatically manages, creates, destroys and promotes multiple environments on-demand based on continuous monitoring and scalability needs.
Congratulations! You’ve done it! Now, wait for the next failure, security flaw or scalability issue to re-design, re-create and transform your business again. Always remember, software development is agile.
Ant colony example
An excellent example of this is Netflix. The platform runs “Chaos Monkey”, which randomly terminates virtual machine instances and containers that run inside the Netflix production environment. As a user, you won’t notice the crazy software program randomly terminating production servers. Complex systems that support platforms like Netflix need to be resilient by design. They should be built like an ant colony.
The science of ants
The communication pathways and foraging behavior of ant colonies inspired the Ant Colony System algorithm [Dorigo1996a] [Gambardella1995], which inspired automated computer programming. Ant colony programming creates computer programs automatically from the goals you specify.
Making your servers run with the automated science of ants should be your DevOps long-term goal.
To join our community Slack 🗣️ and read our weekly Faun topics 🗞️,click here⬇