The importance of tracking your cloud costs

By Andreas Spak
Published on 2022-08-10 (Last modified: 2023-08-10)

...

I have worked as a Devops for many years, on different cloud platforms, for many different clients. Two things that always seems to be trivialized by the people responsible for budgets, are monitoring and cost management. It is like documentation for developers, it never seems to be prioritized, and the actual cost of not having proper system documentation, is hard to track. 

In cloud computing, mistakes almost always shows up on the bill, and are represented by $ at the end of the month. Most cost drivers on cloud platforms are pretty easy to control, such as the cost of storage in S3 buckets, or the use of CPU consumed by containers in Kubernetes. Some cloud resources and services have a more or less fixed price, others have limits that can be set from IAC, when provisioned. Other services are of a more "elastic" nature, and needs to be monitored closely. Let me show a few examples of why it is utterly important to monitor and track your running cloud costs.

 

The SNS service

Let's say you build a system, where your customers can log in by receiving a one-time code by SMS. This one should work for both registered customers, as well as for new, guest customers. For the SMS sending part, you implement the AWS SNS service, which lets us send out text messages directly from AWS, removing the need for invoking a third party service for delivering text messages to our customers. SNS, of course, have the option of setting limits for how many SMS that can be sent out, but this needs to be set to a fairly high limit, because we don't want to risk customers not receiving login credentials in the authentication process.

Unfortunately, your login form (protected by Javascript junk) is attacked and someone use your authentication form to send hundreds of thousands of text messages in 24 hours, reaching the (very expensive) upper limit set for this service. Even worse is, since you have not set up any monitoring for your cloud costs, you are not notified right away, but rather by customers not receiving any text messages with login credentials. This is a true story, which turned out to be an expensive lesson for the client.

 

NAT Gateway horror

Let me give you another, real life, example here, to further illustrate my point by this article. 

You have a few ECS clusters provisioned, running a bunch of microservice containers, with images residing in Gitlab Container Registry. The Container Registry in Gitlab is awesome, in my mind, if you use Gitlab for running your deployment pipelines. No need to deal with pushing images to your cloud platform, just build and store it right on Gitlab. Pulling the images from AWS is super easy, over NAT (since your ECS clusters are not exposed on a public subnet, of course), and it is all very easily configured in your ECS task definitions.

Your microservices use a pretty beefy DocumentDB cluster, to store data, and to save some funds, you automate a few processes to shut down and fire up this cluster in the dev and stage environments, when these environments are not used, such as after work hours, and during weekends etc. This is good practice, and easily done by utilising the 7 day "halt" function available for the DocumentDB service in AWS. DocumentDB is, as far as I know, one of the more expensive managed services available in AWS, so halting it when not used, will save a lot of money.

At the end of the month, when you take a look at the spending overview for the last 30 days, you see that you actually have not saved any money at all. In the AWS Cost Explorer, you find that the "Ec2 other" category has gone through the roof, and when you look closer, the cost for traffic through your NAT Gateway, is way higher than normal. A quick look at the logs for the services running in ECS explains it. Shutting down the DocumentDB cluster, makes the health checks for your microservices in ECS failing (because they expect a working database connection), triggering a restart of each task, which then of course, makes each task download the container image from Gitlab again, and again... and again...

 

If only someone had kept an eye on the costs...

A lot of Devops will probably say that these examples could have easily been prevented, which is true, but I can assure you, these mistakes were made by experienced teams. However, by monitoring your costs, you can, by fairly simple measures, be notified at an early stage before costs skyrocket, and take the necessary steps to fix possible issues.

There are many different ways to monitor costs in AWS, but for a start I recommend to keep it simple. Use your estimated budget as the threshold and create budget alerts. For example, you can create a budget alert and get notified in Slack, that way you will get a heads up quickly if costs deviates from budget. As any experienced cloud Devops knows, cost is one of the most important variables when working with cloud computing, but unfortunately it is also one of the most difficult to understand and control. I hope this little article will shed some light on the importance of keep an eye on your cloud costs.

 




About the author



Andreas Spak

Andreas is a Devops and AWS specialist at Spak Consultants. He is an evangelist for building self-service technologies for cloud platforms, in order to offer clients a better experience on their cloud journey. Andreas has also strong focus on automation and container technologies.

Comments