Cut your cloud costs - part 3

By Andreas Spak
Published on 2021-12-13 (Last modified: 2023-09-14)

...

In this third article about cloud cost management, we're having a look at best practices around infrastructure-as-code (IAC). In order to be able to maintain our cloud computing stack using IAC, we must put some thoughts in the planning process for how to provision infrastructure and cloud services. IAC is a lot about workflows, and in this article I describe different patterns for how to structure the IAC, so that it is easier to implement good workflows. IAC, implicitly refers to an IAC code project, for example in Github, or other ways to work with IAC code, such as using workspaces in Terraform Cloud. I'm using Terraform on AWS in this article, but these patterns described will be the same regardless of what cloud vendor you use.

Now, there are many different architectural workflow patterns for IAC, based on different requirements, such as:

  • Access, security and governance.
  • Team autonomy.
  • Available cloud computing skills within the organisation.
  • Cost optimisation.
  • Size of organisation. 

When I started working with IAC it took me quite a while before understanding the pros and cons of different patterns, and I still find it to be a challenging part of the cloud system design process. The challenge is to come up with workflows that fits the organisation's needs and requirements, while getting the most out of the cloud platform. Let's have a look at a few common patterns for IAC, and get a bit familiar with these.

 

Expert teams pattern

This IAC pattern is based on the idea to have expert teams focus on different parts of the technical stack. For example, an organisation could have a dedicated database team, with database experts being responsible for managing different databases for multiple applications. An OPS team would focus on more low-level infrastructure, such as networking, and would typically be responsible for infrastructure such as VPC, security groups, domains etc. This pattern encourage isolation of workflows, i.e. one team is responsible for a specific part of the IAC.

The benefit of this way to structure your IAC is that an organisation can create very specialised teams with a high level of expertise, and let these teams focus on specific parts of the technical stack. For example, it is likely that the database team will take better decisions about what type of databases should be used for different purposes, deal with database security, backups, logging i.e. everything that has to do with databases. However, because of the same reasons, it requires a high level of communication between teams, since this pattern enforces encapsulation of workflows. For instance, if the applications team need an RDS they don't have direct access to provision an instance themselves, so they have to communicate with the Database experts team to get it. The database team may need to communicate with the security expert team, in order to open up security groups and create roles for the RDS.

 

 

This drawing illustrates workflows between expert teams and IAC projects. The dotted lines describes cross communication between teams.

 

IAC by services pattern

For smaller organisations it is common to have one IAC project bootstrapping a cloud computing architecture, and one or more application / service specific IAC projects. For example, the bootstrap project could provision VPCs and networking, ECS clusters, basic IAM, load balancers etc., while infrastructure and cloud services needed to run the applications would have their own IAC projects, to provision Lambdas, databases, S3, service specific security groups and IAM, and anything else that only the specific service need to do its job.

Interdependencies between different parts of IAC, caused by functional requirements on the services and applications, are almost always impediments for creating workflows to easily destroy and create parts of the IAC. This pattern enforces encapsulation of the IAC "by service", reducing the urge to create interdependencies between teams and projects. From a cost management perspective, this is ideal. 

In terms of autonomy, this pattern is a clear winner, because it encourage (enforces?) workflows that require teams to implement a true devops culture, maintaining all the cloud services themselves. For larger organisations though, it can be challenging to give each team enough devops firepower to be able to maintain large scale cloud operations.

 

 
 
In this example, all teams have workflows set up to work on the bootstrap IAC project, but each team are responsible for IAC necessary to maintain its own software projects. These application IAC projects are isolated from each other, and can be created and destroyed independently, without affecting other parts of the cloud computing stack.

 

Super-admin devops team pattern

In larger organisations, with many autonomous teams, it is common to have one super-admin devops team with cloud computing specialists, and multiple project or application teams. The goal of this pattern is to encourage creating workflows that enables an organisation to have as much control as possible over the most vital parts of the cloud computing stack, while still maintaining some level of autonomy for the application teams. For example teams could have close to full power-user access in development and testing environments, limited access in the staging environment and access to the production environment only through CI/CD, with Sentinel tests in front. Organisations implementing these kind of workflows get a high level of control over the cloud computing operations, and (in theory at least), a higher level of security. It can also be a huge advantage to have one single team consisting of cloud computing specialists, working together as one specialised unit. In the "IAC by services pattern" example, I mentioned that each team must have skilled devops people, to gain full autonomy. By having a dedicated team for devops specialists, an organisation can build up a high level of skills around cloud computing, which will be more difficult if the devops specialists are spread out on different teams and projects.

From a cost management perspective, workflows encouraged by this pattern can be very effective, because it allows the devops team to have control over provisioned infrastructure and cloud services.

The major problem with this approach, the way I see it, is the complexity that comes when trying to implement efficient workflows based on this pattern. Autonomy is one of the key variables for successful cloud computing, and by creating workflows that lock down access to IAC for application teams, the level of autonomy for application teams drops drastically. How to enforce encapsulation of the different workflows is beyond the scope of this article, but whether it is done using IAM, managed solutions such as Terraform Cloud, self-service frameworks or a mixture of all of these, it will add multiple layers of complexity. If not designed and implemented correctly, workflows can grow to become too complex and eventually too costly to maintain. 

 
 
 
Illustration to explain workflows when parts of the cloud computing stack is not directly accessible by other teams then the devops team. This drawing is extremely simplified in order to explain the base pattern. IAC inside the green dotted lines are directly accessible by application teams, while IAC inside the red dotted line is only accessible by the super-admin Devops team.

 

Conclusion

In this article I have talked about how to structure IAC, in order to be able to create good workflows. Obviously, IAC is only the "interface" to thousands of different infrastructure components and hundreds of cloud services, and these patterns described are only abstractions to help us design a good cloud computing architecture, at a high level. But with a few base patterns as a start, we can continue the thought process to weigh pros and cons, and create designs that will fit our organisation's need. By creating good workflows we can ensure we are able to follow the utilize the resources you need, when you need it rule, which is crucial for successful cloud computing.

If you have not read the first articles in my Cut your cloud costs series, about the cloud computing mindset and some thoughts on resource utilisation, these can help shine some more light on how to master the art of cost efficient cloud computing.




About the author



Andreas Spak

Andreas is a Devops and AWS specialist at Spak Consultants. He is an evangelist for building self-service technologies for cloud platforms, in order to offer clients a better experience on their cloud journey. Andreas has also strong focus on automation and container technologies.

Comments