Managing infrastructure costs as your company grows

Managing infrastructure costs as your company grows

Main illustration: Annu Kilpeläinen

Startups are designed to grow quickly, but high growth rates can generate huge costs as new infrastructure is introduced and scaled to meet demand.

How can we maximize returns without introducing developer friction and slowing company growth rate?

Cloud costs can be awkward and are constantly evolving, as proven by the growing world of cloud financial management (FinOps). At Intercom, we use a strong set of foundational principles to guide and align our organization in times of rapid growth. Our cost program is no different – we lay out our intentions and expectations by setting clear principles that allow us to maximize our cost efficiency while growing and serving new customers at scale. So, what are our principles and processes?

1. Take a reactive approach to cost management

At Intercom, shipping is our heartbeat. We believe in the benefits of continuous deployment. For a company to keep growing, it has to keep iterating and delivering value for new and existing customers.

Our philosophy is to ship first and optimize later. The risk is that new features could cause costs to spiral out of control, but we find that when this strategy is implemented alongside our other principles, reactive cost management means we spend more time building and serving customer needs.

“We believe in the benefits of continuous deployment. For a company to keep growing, it has to keep iterating and delivering value for new and existing customers”

Instead of dedicating precious engineering time and resources to creating cost estimates (which may not even match the reality of production), comparing them against carefully forecasted budgets, and seeking the necessary approval to go ahead – our engineers build from the outset. Once the feature or product is up and running, we have a true and accurate picture of costs and how this feature will affect our bottom line. This gives us realistic, accurate, and useful data and guides us nicely to our next principle.

2. Prioritize by impact

As a result of our reactive strategy, we are gifted with a backlog of cost optimization opportunities. The key is identifying which of these opportunities to pursue, and prioritizing by impact. Build out an evaluation framework that uses inputs that matter to your team. For example, at Intercom, we primarily focus on return for effort; the estimated savings we will make per engineering hour required to execute the work.

We evaluate each and every opportunity in our backlog against this framework, using known blockers and constraints as effort inputs, and prioritizing projects based on projected return per investment. This allows us to concentrate resources where they can be most impactful, freeing up other teams to focus on product delivery and customer satisfaction.

3. Centralize governance

Instead of trying to democratize complex, per-product billing knowledge across your team, create a small team of engineers to navigate the world of cloud computing. This team can take on the everyday management of your infrastructure costs, for example:

  • Developing a savings plan policy for your top-tier infrastructure.
  • Exploring new or different instance options for performance and cost benefit.
  • Assessing optimizations of wide-scale operations, e.g. choosing the right auto-scaling policies for your workloads.
  • Crafting tagging policies and implementing them to provide visibility on spend.

Having this specialized group allows your product teams to focus on costs only when absolutely necessary. The more you centralize cost governance, the less time you have to spend on it. Thanks to the evolution of our cost program, we now spend only one hour a week on cost management with great success.

4. No surprises

One of the challenges of a reactive program is monitoring ongoing infrastructure costs and overall spend so your finance team doesn’t get any nasty surprises. Once you’ve spun up your centralized costs team, one of their primary functions should be monitoring your ongoing spend.

“The more you centralize cost governance, the less time you have to spend on it. Thanks to the evolution of our cost program, we now spend only one hour a week on cost management, with great success”

We’ve built our own in-house AWS cost monitoring, but there are plenty of solutions on the market to do this for you, as well as your cloud provider’s own tooling. Take the insights from these tools and create a regular review cadence to evaluate, contextualize, and act on any significant fluctuations.

At Intercom we do this once a week, studying percentage increases and decreases in our top-tier services. This helps us catch any low-level hygiene problems causing costs to spike, and sets shareholder expectations about our upcoming invoices.

5. Zero touch costs

This one speaks for itself. Practices like our Zero Touch Ops approach to engineering show us that automation is a huge timesaver, particularly for a program that doesn’t always have full engineering capacity. Start out small and automate your reporting functionalities and basic cost checks. As you learn more about your costs, you can start to build more sophisticated automations.

Where possible, automate manual tasks that require a detailed knowledge of costs or systems. Make your processes accessible by having your automations do all the heavy lifting.

An example would be our in-house monitoring system that’s mentioned above. The check itself is a basic API call that stores variances in a spreadsheet – the really valuable automation is the alert we’ve built from it. When costs spike above a certain threshold, an alarm is triggered in our Slack channel. Instead of checking the spreadsheet everyday, we use Slack to tell us what we need to know, only when we need to know it.

6. Empower your engineering teams

Even a centralized program needs the support of your engineering teams to be successful. Understanding cloud costs can be a full-time job; to empower your team, you need to translate these complicated cost insights into clear and applicable data that they can use.

Leverage cost allocation tags – these are labels assigned to resources that allow you to organize your resource costs and build out product-based cost reporting that informs your teams for action. Once this data is available, make your requests clear. How should they think about this data? How can they use it? How can they tie potential action back to your centralized framework?

“Understanding cloud costs can be a full-time job; to empower your team, you need to translate these complicated cost insights into clear and applicable data that they can use”

Infrastructure costs won’t be the only input on their roadmap; for some teams it might not be the highest priority. Make sure you optimize your window and let them know the most meaningful way they can contribute.

Invest in the process

Infrastructure cost management needs regular attention, but that doesn’t mean it has to be difficult work. Design your own principles and tie in tangible actions that help you drive progress and results. Aim for practices that empower and support your engineers instead of blocking them. The more you invest in your team’s cost management process, the less it will restrict developers – and the more you’ll maximize returns. 

Intercom Blog CTA Careers Horizontal