Cut your cloud costs

One of the primary reasons for moving to a cloud platform, is cost efficiency. Or, to be more specific, to make use of the managed services provided by a cloud platform, and pay less (at least not more) than what you did before. In my article Cut your cloud costs - part 1, I talk about the power of outsourcing your system's services to managed services, and also scratched the surface of the cloud computing architecture mindset. In this article I focus on the very important statement:

Utilize the resources you need, when you need them!

This is a huge topic, especially when we get to the technical details of it, so in this article, we are going to stay at a high level, and talk about some things to think about when designing your cloud environments.

Quality, availability and elasticity

I mentioned earlier that one of the primary reasons for migrating to a cloud platform, is cost efficiency, however, there are a few other, implicit factors, which makes cloud computing a game changer for modern organisations. Without these factors, there would be no use in migrating to the cloud, since the bill from the cloud vendor will be higher than the one from your server host company. This is because your hosting company only provides a virtual server, while your cloud vendor delivers a platform with hundreds of services available. This is important to keep in mind while working with cost management in the cloud.

Quality

For some, this may be a controversial opinion, but the quality improvement to any architecture moved to the cloud, is significant. Or to be more specific, cloud computing (the services provided by cloud vendors), allow us to relatively easily build robust and dynamic systems, that meets the requirements for modern applications. Sure, the old mainframe systems back in the days were robust and stable too, but they didn't operate in the same context as modern systems, and didn't have the same requirements for scalability, integration and flexibility. One example is the file storage service S3 in AWS. By using S3 for storage, you add a high level of availability, security and elasticity to your applications, all of which are quality improving factors. To build your own equivalent to S3, would be extremely difficult and expensive. S3 is one example of how cloud computing provides services that can improve the quality of your products.

Availability

What is possibly the most significant improvement for applications moving to the cloud, is availability. To have virtualized resources and services available 24/7, 365 days year, available through simple APIs, was a developer's wet dream no more than 10 years ago. I have been on projects where a simple thing such as adding a load balancer to my team's applications, involved a complicated process that took over one week to complete. On one particular project I waited 2 weeks for some admin to give me ssh-access so that I could work. With cloud computing, things like waiting to provision infrastructure, tuning systems and grant and revoke access, is history. For organizations that value time to market, cloud computing has been a game changer, simply because of the extreme level of availability.

Elasticity

A made up word, but it simply means scalability on multiple levels, from disk space and network load, to RAM and CPU. Cloud computing allow us to scale up and down both horizontally and vertically, not only our application specific technology, but also infrastructure and services. A high level of scalability is crucial for both productivity and cost efficiency. The need for highly scalable and cost effective systems, is one of the primary reasons to why technologies such as Kubernetes was invented.

I think is is safe to say, that elasticity as we know it, started with container technology, like Docker. From then it has evolved into more and more sophisticated solutions to scale systems up and down, until serverless became a thing. Serverless technology is as elastic as it gets, because it allows you to narrow down your resource usage to seconds, or even milliseconds. This means that you utilize CPU, memory, network, load balancers etc. for a limited time, when your application demands it, after that those resources are free to be used by something else. For example, let's say you have a service listening to a message queue of some kind, and it only really does anything when it receives messages. Wouldn't it be a perfect solution if you could utilize resources only for those seconds your service is handling those messages? Why would you want to pay for a node running 24/7, without using it all the time?

It all starts with environments

To implement these "utilization rules" into your system architecture is crucial in order to do cloud computing cost effective, and it all starts with how you set up your different environments. How you think about your environments can make a huge difference when it comes down to cost saving and resource optimisation, so let's take a closer look at how we could plan our environments in the cloud. Here, I assume that each of these environments run in their own cloud account (which is strongly recommended).

Dev

The dev environment should primarily be used for development, i.e. it is the primary account for developers. This environment could be subject for lots of changes, POCs, testing out new tools etc., and often this account is where most of the cloud related costs are generated, primarily because of the sheer amount of resources used during the developing processes. Resources and services under this account should be easily downscaled when not used, such as after work hours, or deally, the entire dev environment should be easy to destroy and create using IAC or some self service. To make this possible it is important to eliminate or at least minimise the number of system integrations and dependencies for this environment, by making them plug-and-play or even mocked.

Keep resources as slim and cost effective as possible, and make sure to not over-provision, but rather keep CPU, memory and disks at the limit of what you need for developing.

Test

Dev-like environment, but primarily used for running unit testing and some integration testing. Like the dev environment, the test environment should be easy to downscale, or destroy, and have little or no integrations and dependencies to other systems. If a test account is used in CI/CD processes, such as being part of a deployment pipeline, it needs to be up and running at the same schedule as the dev account. Costs generated from a dedicated test account should typically be lower than the dev account, because of less resources and services provisioned in the account.

Stage

This environment is dedicated for running more sophisticated tests, such as serious integration tests and end-to-end tests, and should be as production-like as possible. Normally, this account is used less frequently than the dev and test accounts, and should be downscaled or destroyed when not used. As with the test account, if it is part of a deployment pipeline, it needs to be operational when CI/CD processes are active. Because systems running in this environment often needs real interaction with other systems, in order to perform integration tests and end-to-end tests, it means that scaling or destroying / creating often involves a more complicated process, then with the dev and test environments.

In my experience, most organisations can reduce costs by reducing the frequency of running end-to-end tests, and downscale or destroy the stage environment when not used. Since a stage environment should be as production-like as possible, it often means over-provisioned resources (high CPU and memory, databases with large datasets, multiple pod replicas etc.), so the costs generated in the stage account can be quite hefty.

Prod

This is the account where all systems run in production, which means high redundancy, high capacity and high availability. Cost drivers can be anything from large databases to applications demanding high CPU and memory. Still, services running in production can still be very cost effective, by utilizing serverless technologies, using container technologies with auto scaling (such as Kubernetes), and by using different payment-plans offered by the cloud vendors. I will discuss these in more detail in the coming articles in this series, so stay tuned!

Conclusion

In order to make cloud computing as cost effective as possible, you need to design your system with a cloud computing mindset. This sounds much easier than it actually is, but by following a few guidelines, you will be better prepared to tackle the technical challenges ahead, with a cost management mindset.

Make sure you can scale down your applications, services, systems etc., as much as possible, when not used. Remember: utilize the resources you need, when you need them.
Keep a clear separation between your different environments, i.e. dev, test, stage and prod.
Make sure that cloud resource utilisation and cost is a factor in your system design.
Consider serverless technology when possible.
Be conscious about how much resources your services requires.

In the next article in this series, I will get a little more technical, and discuss the use of containers in AWS, from a cost management perspective.