Software Updates, In-Place or Build and Replace

Jan 9, 2019 | Last edit: Jan 9, 2019

What is Blue-Green Deployment?

Software is designed to be updated. Often. The question is how to do that: update in place, or rebuild and replace?

On a desktop computer, you would never expect to rebuild it, when you can simply run an update package on the current platform. Software running on a server asks us to think differently. Why?

We expect servers to operate with little or no downtime. Running an update would (should!) imply taking the system offline for visitors even for a short time to complete. Otherwise, we are compromising the performance and availability of the service in ways we cannot anticipate. I've never heard of someone testing the user load on a server while it's running a package manager to perform an update.

Another difference is that servers tend not to offer helpful UI tools for managing updates and installations. It can be nerve-racking to type and run a brand-new command that will affect software on the whole machine. "Worked on my machine," are words we hope not to hear.

For these and other reasons, one common pattern to manage software updates on servers is called blue-green deployment. It provides that two production environments are operating, one called "blue" the other "green." One is the current live production system, and the other is the future production system.

The future stack does not need to be running in parallel for a long time before going live, but it must contain all the components to operate independently. It should have the code, database, cache, file server, load balancer, etc. There are various ways to test the components and the whole stack in a development settings, with Docker or virtualization systems. But there is only one way the system will function in production ... and that's on production equipment.

Blue-Green Deployment in Practice

A typical deployment can begin after the development is complete, and we create the new environment. Imagine we have "blue" servers in production, and we need to create "green."

NOTE: these colors are standard in the industry for reasons I don't know. Once we tried to use red/pink, but people got mad. There are different ways to track which environment is "blue" or "green". Cloud providers allow you to assign arbitrary tags to a release, but I have not seen anything specific to blue/green. Depending on the type of system, we can use environment variables to automatically add the words "blue" or "green" in the naming of server instances, etc. Of course, we can always do this manually and keep a post-it note on the desk, as well.

Steps involved:

Provision servers
Deploy code and custom configuration
Verify the green environment
Re-point DNS to green server environment
Verify green is working as production; if not, we can revert to blue
Create backup of blue environment, optional
Tear down blue environment

What are the Benefits of Blue-Green Deployments?

The blue-green pattern enables us to isolate the future state from the current state, and that's important for several reasons.

isolate potential bugs
isolate code testing
isolate performance verification, load testing
eliminate update artifacts
reinforce documentation, resilience of system

It's always wise to perform QA, automated testing and/or load testing on any system, and the safest way is on one with no real users. That isolation is great, but the latter two reasons above are the real focus.

Software updates may require managers or code-convergence tools to merge the old with the new. Linux systems, for example, may throw you into a selection screen where you have to choose whether to accept a new configuration file, keep your old one, or perform some sort of merge. Woe be the person trying to resolve a conflict via terminal in the middle of an update cycle. Even in the simplest of cases, what happens with the files in question? Do you have to add that configuration file or whatever into your code or version-control? The update itself may have generated changes.

For me, the number one reason to consider blue-green deployment is the last one listed above. Can you rebuild your system? Outages can happen for various reasons: security breach, equipment failure, engineer error. Better yet, your team sees a strategic advantage to change hosting providers: are you locked in or can you jump to a new host? Finally, do we know that our development environments are an accurate mirror of production?

Building a new server helps to reinforce the system and make the entire process, and the team, more resilient. That can happen through documentation, automation, or teamwork.

If the deployment process is a manual one, then proper documentation will guide you through it. Checklists are a great way to capture the steps involved, and in some cases the reasoning behind decisions. Run into a hole in the checklist? You've just discovered a new opportunity to refine the docs! Find a spot where it's unclear why we select option A, and not B? That's another potential improvement we can make! This type of documentation is best when we constantly review it, to ensure it's up-to-date, accurate and complete.

Automation tools like Ansible, Chef, Puppet and others, can manage most if not all of the deployment tasks as well. These scripts represent their own form of documentation. Want to build a development server for a new dev? You don't need to figure out how to run the Ansible playbook on a local machine, just step through the script and repeat the commands by hand.

Finally, teams exist to share knowledge, responsibility and success. Deploying servers as a team can be a great opportunity for each member of the team to play a role and showcase their skills. Devops can deploy the new environment. "Wow that was fast." Developers can verify the new feature being deployed. "Excellent new tool." QAs can run through their testing routine. "That is really thorough." Ultimately, no one person should be responsible for all of this knowledge. Deploying as a team exposes all the members to the roles we play, and offers the chance for someone to ask a question, learn, refine the process, exchange ideas.

Cost

So by now, you should be saying, "Yes I want to do this. But ... two sets of servers? So I'm paying double?!"

This is true, you will need two systems. But only for as long as the deployment requires. Couple hours. Day or two. A week. All of the major hosting providers are moving to pay-by-the-hour models. So we pay for only what we use.

In the end, I think the peace of mind is worth it.