It’s finally time to tie the loop. By now we’ve discussed the motivation, our end goal, automating the creation of infrastructure, and provisioning machines, which leaves us with a function environment running our application. Our last step in the process is to automate deployments.
The term continuous delivery (CD) refers to the process of deploying new versions of an application automatically upon detecting changes to its source repository. That process roughly breaks down to the following steps:
- Source – Detect source change; pull source.
- Build – Build artifacts from updates source code.
- Test – Test build artifacts.
- Approve – Obtain manual approval (optional)
- Deploy – Deploy build artifacts to an environment.
If any one of these steps fail, the process stops and a failure alert gets sent out.
Many web applications rely on multiple microservices and environments, each of which requires its own deployment process. Since continuous deployment refers to the overall approach, we use another term for the actual implementation of these steps for each application: a deployment pipeline.
These deployment pipelines run continuously on a server. A number of tools exist to host and run them for you. Google provides Cloud Build and Amazon AWS provides CodePipeline. You also have quite a few options from third party vendors: CircleCI, TravisCI, CodeShip, GitLab, to name a few – there are many more. For the bold and daring, you can always setup your own deployment pipelines using a classic self-hosted Jenkins instance. You’ve got quite a few choices, all of which solve roughly the same problem.
That’s all fine and dandy, but it’s not quite enough detail to get everything up and running, so let’s go deeper into each step in a deployment pipeline.
Step 1. Source
The purpose of this step is twofold: to kick off a deployment, and to retrieve the source code. These two actions tend to go hand in hand, so despite being two separate tasks, they’re generally combined into a single step.
Most source control services provide a mechanism called web hooks to announce changes to a repository. Whenever you push a change to a repository, the service will make an HTTP request containing the change metadata to a URL that you specify. This allows you to listen to repository changes on-demand.
If you don’t have an event driven mechanism like this available, you can always fall back to polling the repository for changes. Keep the latest commit hash handy on your deployment machine, pull the latest commit hash every couple minutes, and then pull the latest commit if it differs.
You likely won’t have to worry about this step if you use an out-of-the-box tool. Just think of it this way:
- Input: An event signaling a change in your repository.
- Output: The latest source code.
Step 2. Build
You’ve got the source code. Now you have to build it. The complexity of this step varies considerably based on your setup.
It’s important to note that build processes often produce binaries specific to an operating system. If you run a build on Ubuntu that will get deployed to a Windows server, you probably won’t have a good time. Those binaries won’t run. You’ll have to make sure that you build your source on the same operating system you run in production.
Hosted continuous deployment services may or may not let you configure this. You’ll have to check each of them out and test to make sure it works for your application.
You have a bit more flexibility with Jenkins, which you can install on any operating system, and which provides a number of ways to spin up workers running various operating systems. Hopefully you don’t run into this problem, but it’s worth mentioning so that you don’t run into some painful errors far down in the process.
Once you have all of your binaries built, it’s time to test them.
- Input: The latest source code
- Output: Build artifacts
Step 3. Test
This phase is pretty self explanatory, though the extent of tests can vary widely depending on your needs. Some companies like running full UI, integration, and unit test suites here. This can take a long time. Some companies just run unit tests. This shouldn’t take a long time (Cough). Some companies don’t test at all. This is instant. Either way, you have a lot of options. These options generally trade stability for reduced build times.
Keeping in mind that your build should be running on a fresh server that’s nearly identical to your production environment, it’s pretty simple to run tests. Standard practice is to have a script checked into your repository that runs your tests and returns the appropriate error code upon failure. This makes it easy for continuous deployment pipelines to run your tests: since you have your source pulled and all dependencies installed, just call the command from the test step in your pipeline.
- Input: Build artifacts
- Output: Build artifacts
Step 4. Approve (optional)
Again, depending on your risk tolerance, you can setup an approval step. This generally sends out an email to a group of people requesting sign-off for a deployment. This email may or may not contain a link to the application-in-question. I won’t go into much detail here, since it’s more about business processes than actual deployment.
- Input: Build artifacts, Approval status
- Output: Build artifacts
Step 5. Deploy
This is the difficult step. This is the step where you take your built and tested application and you push it out to an environment. It’s also the step with perhaps the least concensus on the correct approach, so I’ll just give you my thoughts with that disclaimer.
As always, I like considering the historical approach before jumping into modern alternatives. Before cloud services and widely available virtual machine provisioning, we had bare metal machines. They just sat there, and you connected to them and installed software. Running deployments meant updating these machines, so let’s talk about what that looked like.
Even when starting out in the cloud, this type of deployment makes a lot of sense, especially if you don’t have any automation already in place. You might be using one or two machines without autoscaling setup, so your deployment targets are largely static.
Setting up a live deployment looks something like this:
- Upload build artifact to a new, temporary directory.
- Point production code symbolic link at this temporary directory.
- Restart relevant application services.
- Delete old deployments.
It’s easy to think about this and to set it up. Tools like Fabric and Capistrano handle a lot of the mess for you. Unfortunately, the devil is in the details. Let’s think about what might go wrong:
- The source code now requires a globally installed package with a version that conflicts with a previously installed global package.
- A soft restart fails due to a port not having closed in time.
- A hard restart terminates a crucial customer transaction.
These are highly probably events, and they apply to most applications. Your specific application probably has unique ways it may break down during a deployment.
In addition to this, the deployment process won’t create new servers in the case of increased traffic, and it won’t recover in the case of failure. Since we’re building a modern web application, we definitely want that functionality.
Even more importantly: we have to worry about changing state on a single server. This adds complexity and uncertainty. We might run into a bug due to shared state between versions on a single server throwing an error that’s incredibly difficult to reproduce. That can take considerable time to diagnose and fix.
We’ve discussed the alternative to this already: deploying immutable infrastructure. You can already push a button and go from having no servers to having your entire infrastructure deployed from source. That’s an incredible accomplishment, and it enables you to deploy new application versions from scratch every time. You don’t have to reboot anything: just start it once.
Let’s change gears and discuss deploying immutable infrastructure. Rather than updating existing servers, you’ll create new ones from scratch and then connect them to your environment’s load balancer.
If you didn’t care at all about downtime, you could just run the following commands to deploy:
packer build packer.json terraform apply -auto-accept
Yes. That’s correct. You can literally just run those commands we ended with in the previous post. That will execute a basic deployment.
- Pull the latest source
- Build a new AMI from it
- Provision two new matchines
- Delete the old machines
- Point your load balancer at the two new machines
Provided that your deployment pipeline has access to all of the necessary environment variables, this would actually work pretty well. It doesn’t support rollbacks without some manual effort, but it would likely do the trick for a small, low-traffic site.
These two approaches illustrate perhaps the simplest mutable and immutable deployment techniques. Now let’s consider a few popular improvements on this process.
One of the problems with the simple immutable deployment is that in the case of a runtime failure, there’s no easy way to rollback. You would have to change the AMI in Terraform to an old version and then apply the changes. This might not even solve the problem.
Okay, fine. Compared to a manual rollback, that really wasn’t too difficult. Regardless, it introduced downtime. If you didn’t have proper alerting setup, you may not have noticed the issue for an hour, and then you had to think about how to rollback, and then run the commands. That could have taken another 30 minutes.
Blue/green deployments solve this problem by running two environments simultaneously: a blue environment, and a green environment. In order to deploy, you create a new environment (we’ll call it the green environemtn) and test it. Once you’re happy, you take the machines in the live environment (we’ll call it the blue environment) and you swap the machines in each environment’s load balancer. You don’t delete any machines.
How does this help? Let’s say the traffic to the live environment starts throwing errors. In order to rollback, you swap the blue and green environment’s machines again. That’s an instant rollback, and you can be pretty sure the old environment will still function. That’s the power of blue/green deployments.
Another popular technique is to use rolling updates. Rather than spin up two environments side by side, new machines are added into an existing load balancer while old machines are taken out. This process is very similar to our original approach, except that it doesn’t have any down time. Rolling back would require phasing old machines back in while taking new ones out.
This is pretty straightforward.
It’s worth mentioning a few important points:
- All of these methods assume you’re deploying stateless servers.
- Database migrations become much trickier in an automated deployment process. If you’re running a database with a schema, you will want to default to backwards compatible schema changes. If you absolutely must make a non-backwards compatible change, then do it in phases (and make sure to backup after each phase) so that each version of the application will run without downtime.
- Running servers may have clients connected. You’ll need to make sure that you don’t terminate these servers. You can avoid this problem by using connection draining (See: AWS, Google Cloud).
That should do for now. If you’re interested in seeing what a Cloud Build or CodePipeline definition looks like, check out one of my previous posts on setting them up from scratch:
- Continuous Delivery With AWS Beanstalk, CodePipeline and Terraform
- Continuous Delivery Using Google Kubernetes Engine and Google Cloud Build
Learning how to a build a production web application can take a lot of digging. One of the problems I faced when I first started was that every tool promised to solve every problem, so it was difficult to figure out what the big picture looked like. If that’s you, then hopefully these posts help clarify the bigger picture when trying to pick a tool.
It’s been a long series, but I think it’s enough to serve as a solid foundation for the rest of the blog. These posts were intended to outline the high level concepts that underly future posts, which will contain more concrete implementation details or nuanced use cases. I think the series has done that well enough.
I’d love to hear what you thought about the series so far. Let me know in the by commenting below or sending me a direct email through my newsletter.