Chapter 30: Cost as a System Property — Principles of Distributed Systems Design

What's Coming in This Chapter

Most engineers treat cost as someone else's problem — ops handles the bill, finance handles the budget. But the decisions that drive cloud spend are made at a keyboard, not in a spreadsheet. The choice of where to store data, how many times a service calls another service, whether you run your own database or use a managed one — these are architectural decisions, and they have price tags.

In this chapter we look at cloud cost from the inside out. We'll go through the hidden data transfer tax that catches most teams off guard, the art of matching storage to access patterns, how to think about the real total cost of open-source software, and how to build a culture where the teams who create cost can also see it.

Data Transfer Tax Storage Tiering Compute Efficiency Build vs. Buy TCO FinOps Cost-Performance Trade-offs

Key Learnings

Cost is an architecture decision: The biggest cost drivers — how data moves, how storage is used, which services talk to which — are baked in at design time, not at billing time.
Data transfer is the most commonly overlooked cost: Moving bytes between availability zones, regions, or out to the internet has a price. In data-heavy systems this can dwarf compute costs.
Storage tiers exist because access patterns differ: Putting everything in hot storage is lazy and expensive. Putting everything in cold storage is cheap and unusable. Match the tier to how often you actually read the data.
Open-source is never free: The license costs nothing. The engineers who operate it, the incidents they handle, and the on-call rotation that wakes up at 3am — those are the real costs. Managed services often win on total cost of ownership.
Reserved capacity is an architectural commitment: Committing to reserved instances forces you to understand your baseline load, which makes you a better architect.
You can't optimize what you can't see: Until cost is attributed to the team and service that caused it, no one has the right incentives to reduce it.
Performance and cost are not opposites: Caching, batching, and better data models often improve both simultaneously. The cheapest request is the one you never make.

Why Cost Is an Architecture Problem

Here's a common story. A team builds a new feature. It ships. A month later, the cloud bill is up by $40,000. Someone files a ticket with the infrastructure team. The infrastructure team looks at the bill, sees a service that's calling another service thousands of times per second, and sends an email asking what happened. The engineering team says they had no idea it cost that much. Nobody is lying. Nobody was trying to be wasteful. The cost was invisible at design time and only visible after the fact.

This is the normal state in most engineering organizations. And it's a design flaw, not a people flaw.

The decisions that drive cost happen at the keyboard:

Calling a remote service inside a tight loop instead of batching the calls
Storing every event in a primary database instead of offloading to cheaper object storage
Deploying a service to three availability zones when one would do
Returning full objects when callers only need two fields
Polling every 5 seconds when a webhook would do

None of these are bad engineering in isolation. But each one has a cost, and if engineers don't know what that cost is — or don't have any visibility into it — they'll make these choices without thinking twice.

The goal of this chapter is to make cost legible. Not to make you obsess over every byte, but to give you enough of a mental model that cost becomes a first-class input to your design decisions, alongside latency, reliability, and maintainability.

The Data Transfer Tax

If there's one cost concept that consistently surprises engineers, it's data transfer. Not storage. Not compute. Moving data.

Cloud providers charge you for bytes that leave their network. The exact numbers vary and change over time, but the structure is consistent across AWS, GCP, and Azure:

Traffic Type	Approximate Cost	Notes
Within the same AZ	Free (or near-free)	Same availability zone, same region
Cross-AZ within a region	~$0.01/GB each way	Both sides pay — so $0.02/GB round trip
Cross-region	~$0.02–$0.09/GB	Varies by region pair
Egress to the internet	~$0.08–$0.09/GB	First 1–10 GB/month often free
CDN egress	~$0.01–$0.02/GB	Much cheaper — and often faster

Let's make this concrete. Suppose you have a service that handles user profile lookups. Each profile response is about 8 KB. Your service runs in us-east-1a but your database read replicas are in us-east-1b (a different AZ) for redundancy. You handle 500 requests per second.

Back-of-envelope calculation

Requests per second:    500
Response size:          8 KB
Data per second:        500 × 8 KB = 4 MB/s = ~345 GB/day

Cross-AZ transfer cost: $0.01/GB each way
Round-trip cost:        $0.02/GB
Daily cost:             345 GB × $0.02 = $6.90/day
Monthly cost:           ~$207/month
Annual cost:            ~$2,500/year

This is just for profile lookups.
Add your other services and you can easily reach $50K–$100K/year in data transfer alone.

This isn't a hypothetical. Teams running microservices architectures with 20–30 services often discover that their data transfer costs rival or exceed their compute costs, because every service call crosses an AZ boundary.

The Three Patterns That Drive Transfer Costs

1. Chatty microservices

When a single user request fans out to 8 downstream services, each of which calls 2–3 more, you're sending hundreds of small HTTP requests per user-facing request. Each one crosses an AZ boundary. The individual cost is tiny. The aggregate cost at scale is not.

The mitigation is not to abandon microservices — it's to be intentional about what goes where. Services that talk to each other constantly should be in the same AZ. Use service mesh features to enforce locality. Prefer aggregating calls at the edge (BFF pattern) over letting each client call 10 different services.

2. Large payloads with small reads

A service returns a 200 KB JSON blob because that's what the schema says it should return. The caller extracts 3 fields and discards the rest. You paid for 200 KB of transfer. You needed about 200 bytes.

GraphQL was partly born from this exact problem — let clients ask for exactly what they need. But you don't need GraphQL to fix it. Sparse fieldsets in REST APIs, projection queries in databases, and deliberate API design that returns lean payloads by default are all effective.

3. Replicated data between regions

Running a multi-region active-active setup means your writes replicate across regions. If your system writes 1 TB per day and replicates to two other regions, that's 2 TB of cross-region transfer daily — roughly $5,800/month at standard AWS rates, just for replication.

This doesn't mean multi-region is wrong. It means it has a cost you need to plan for, and it should factor into your decision about how much data to replicate and how often.

Watch Out

Managed services like RDS Multi-AZ, ElastiCache, and MSK all replicate data across AZs automatically. That replication traffic is usually included in the service price — but the application-level traffic to and from those services is not. Read the pricing page carefully. The fine print usually says "data transfer between AZs is charged at standard rates."

Storage Tiering — Matching Cost to Access Pattern

Cloud object storage (S3, GCS, Azure Blob) has multiple tiers. Engineers often put everything in the standard/hot tier because it's the default and they don't have to think about it. This is expensive.

The economics of storage tiering work because different data has different access patterns. A user's recent transaction history is read frequently. A user's transaction history from three years ago is almost never read, except during an audit. Storing both at the same price makes no sense.

Tier	Storage Cost (approx)	Retrieval Cost	Retrieval Latency	Right For
Standard (Hot)	~$0.023/GB/month	Negligible	Milliseconds	Data accessed multiple times per week
Infrequent Access	~$0.0125/GB/month	~$0.01/GB	Milliseconds	Data accessed once a month or less
Glacier Instant	~$0.004/GB/month	~$0.03/GB	Milliseconds	Data accessed a few times per year
Glacier Deep Archive	~$0.00099/GB/month	~$0.02/GB + hours wait	Hours	Compliance archives, almost never accessed

The math here matters. If you have 100 TB of data that's more than 90 days old and rarely accessed, moving it from Standard to Infrequent Access saves roughly:

100 TB = 100,000 GB

Standard:            100,000 × $0.023  = $2,300/month
Infrequent Access:   100,000 × $0.0125 = $1,250/month

Monthly savings:     $1,050
Annual savings:      $12,600

And that's just one data set.

Lifecycle Policies Are Not Optional

Every object store supports lifecycle policies — rules that automatically move objects between tiers based on age or last-accessed time. Setting these up is not premature optimization. It's basic hygiene.

A simple starting policy for most systems:

S3 lifecycle policy — conceptual

Day 0–30:    Standard (hot)          — active users, recent events, current state
Day 30–90:   Standard-IA             — still occasionally queried, e.g. reports
Day 90–365:  Glacier Instant         — mostly dormant, compliance access only
Day 365+:    Glacier Deep Archive    — pure compliance, assume you'll never read it

The exact thresholds depend on your access patterns. The point is to have a policy and enforce it automatically, rather than letting data accumulate in hot storage indefinitely.

The Retrieval Trap

Here's where teams get burned: they move data to Glacier to save on storage costs, then run a batch job that reads all of it. The storage savings evaporate and then some.

The retrieval cost structure is designed to make cold storage cheap for data you truly rarely touch. If you're accessing data in "archive" tier more than a few times per year, it's probably in the wrong tier.

A Practical Rule of Thumb

If you can answer "how often is this data actually read in production?" — you can pick the right tier. If you can't answer that question, you need access logging before you can make an informed decision. Enable S3 server access logging or GCS data access audit logs for one month, then look at the data.

Storage Tiering for Databases

Tiering isn't just for object storage. It applies to databases too, though the mechanism is different.

The classic problem: you have a Postgres database that holds 5 years of order history. 99% of reads hit orders from the last 90 days. But you're paying for a beefy instance with fast SSD storage to hold all 5 years because they're all in the same table.

The solution is data archival — periodically moving old rows from the primary database to cheaper storage (S3, BigQuery, a read-optimized columnar store). The primary database stays lean and fast. Old data is still queryable, just with higher latency, which is acceptable because nobody's waiting on it interactively.

This pattern has a name: hot-warm-cold data architecture. The key discipline is deciding, upfront, which queries need to hit which tier, and setting service-level agreements accordingly. If someone runs an annual report over 5 years of data and it takes 30 seconds, that's fine. If that query is being run in a user-facing product that expects a 200ms response, you have a mismatch.

Compute Cost — Matching Capacity to Demand

Compute costs are the most visible line on the cloud bill, but they're often not the biggest lever. Still, there's significant money left on the table in most organizations' compute purchasing strategy.

The Three Purchasing Models

Cloud providers offer the same compute at very different prices depending on how you commit.

Model	Relative Cost	Commitment	Best For	Risk
On-Demand	100% (baseline)	None	Unpredictable or short workloads	None — just pay more
Reserved (1yr)	~40% cheaper	1 year	Stable baseline load	Stranded cost if load drops
Reserved (3yr)	~60% cheaper	3 years	Core infrastructure	High stranded cost risk
Spot / Preemptible	~70–90% cheaper	None (can be reclaimed)	Batch, ML training, fault-tolerant jobs	Nodes can disappear mid-job

The right strategy for most production systems is a mix:

Reserved instances for your predictable baseline — the capacity you're running at 2am on a Tuesday. This is your floor.
On-demand for bursting above the baseline during peak hours.
Spot/Preemptible for background batch workloads that can tolerate interruption — nightly ETL jobs, model training, index builds.

The discipline required to use reserved instances well is the same discipline required to understand your traffic patterns. If you don't know your baseline load, you can't commit to reserved capacity responsibly. Committing too much leaves you paying for idle capacity. Committing too little means you're leaving 40–60% savings on the table.

Right-Sizing — The Unsexy Savings

Most organizations are running instances that are 2–3x larger than they need to be. This happens for understandable reasons:

An engineer picked the size conservatively during initial deploy and nobody revisited it
A service was migrated from a physical server to a cloud instance with "at least as much RAM"
CPU and memory utilization metrics weren't set up, so nobody noticed the machine was 15% utilized

Right-sizing is about looking at actual CPU and memory utilization over time (not just the peak you saw once during an incident) and downsizing or consolidating instances to match real demand. A service running at 10% CPU on a 32-core machine could probably run on a 4-core machine with headroom to spare.

The reason this is "unsexy" is that it requires saying "let's make this smaller" and then waiting nervously to see if it breaks. Nobody gets promoted for right-sizing. But it's one of the most straightforward ways to cut a cloud bill by 20–30%.

Useful Tool

AWS Compute Optimizer, GCP Recommender, and Azure Advisor all analyze your usage patterns and recommend right-sized instance types. These tools are free and often surface recommendations that save thousands of dollars per month. The recommendations aren't always right, but they're a good starting point for a quarterly right-sizing review.

Build vs. Buy — The True Cost of Open-Source

Open-source software has no license fee. This leads to a dangerously incomplete mental model: "Kafka is free, so it costs us nothing to run our own Kafka cluster."

The license is the smallest part of the total cost of ownership (TCO). Let's build a real picture.

A Concrete Example: Self-Managed Kafka vs. Managed Kafka

Suppose your team needs a message queue handling 500 MB/s peak throughput. You're evaluating running your own Apache Kafka cluster versus using a managed service like Confluent Cloud or Amazon MSK.

Self-managed Kafka — real costs

Infrastructure:
  6 broker nodes (r5.2xlarge, reserved 1yr):   ~$800/month each = $4,800/month
  3 ZooKeeper nodes:                            ~$300/month total
  Storage (10 TB, gp3):                        ~$800/month
  Cross-AZ replication traffic:                ~$400/month
  Total infrastructure:                        ~$6,300/month

Engineering time:
  Initial setup + tuning (one-time):           ~3 weeks × 2 engineers
  Ongoing operations (patching, tuning,
  capacity planning, incident response):       ~0.3 FTE/year
  At $200K all-in engineer cost:               $60,000/year = $5,000/month

Incidents:
  Major Kafka incident per quarter (avg):      ~2 days of senior engineer time
  4 × 2 days × $1,000/day (opportunity cost): $8,000/year = ~$670/month

Total self-managed per month:                  ~$12,000/month
Total self-managed per year:                   ~$144,000/year

Managed Kafka (e.g., Confluent Cloud) — approximate

For 500 MB/s peak, moderate retention:       ~$8,000–$12,000/month

Engineering time:
  Setup + integration:                         ~1 week × 1 engineer (one-time)
  Ongoing operations:                          ~0.05 FTE/year = $830/month

Total managed per month:                       ~$10,000–$13,000/month
Total managed per year:                        ~$120,000–$156,000/year

The numbers are surprisingly close. Sometimes managed is cheaper all-in. Sometimes self-managed wins. What the comparison makes clear is that engineering time is almost always the dominant cost, and it's the one that's most often left out of the analysis.

When Self-Managed Makes Sense

Self-managing infrastructure is not always wrong. There are legitimate reasons to do it:

Control requirements: Your data cannot leave your VPC for compliance reasons. You need features not available in the managed offering. You need custom tuning beyond what the managed service exposes.
Scale economics: At very large scale — tens of TB/day, hundreds of nodes — managed service pricing becomes extreme and the economics flip. Companies like Netflix, Uber, and Airbnb run their own infrastructure partly because they're big enough to amortize the engineering cost.
Deep expertise already exists: If you already have 3 engineers who are Kafka experts and a proven runbook, the marginal cost of operating Kafka is much lower than if you're starting from scratch.

When Managed Makes Sense

For most teams, most of the time, managed services win:

You're small: A 5-engineer team should not be running its own database cluster, message queue, and search infrastructure. The ops burden crowds out product work.
Your expertise is elsewhere: A company whose core competency is payments should not be building deep Kafka expertise unless the queue is genuinely central to their technical edge.
Reliability requirements are high: Managed services have dedicated reliability teams. Unless you're prepared to match that, the managed service's SLA is probably better than yours.

The Trap to Avoid

The decision to self-manage is often made by engineers who are excited about the technology, not by a clear-eyed TCO analysis. "We should run our own Kafka, it'll be a good learning experience" is a real sentence that has cost organizations hundreds of thousands of dollars in engineering time. The learning experience is real. So is the cost.

Cost and Performance Are Not Opposites

A common misconception: making a system faster always makes it more expensive. This is sometimes true, but often the opposite is true. Many optimizations simultaneously reduce latency and reduce cost.

The Cheapest Request Is the One You Don't Make

Caching a database response doesn't just make reads faster — it also means fewer database calls, which means less compute on the database server, which means you can run a smaller or fewer database instances. A cache that handles 80% of reads is not just a latency optimization. It's a cost multiplier on your database fleet.

Similarly, batching writes instead of writing one row at a time reduces database connection overhead, reduces per-request latency, and often allows you to use a smaller database instance.

Expensive Fan-Out Patterns

Some architectural patterns are both slow and expensive. The N+1 query is the most famous example: fetch 100 users, then for each user make a separate database call to fetch their profile. That's 101 queries instead of 1. 101 database round-trips instead of 1. 101× the cost.

The same pattern appears at the service level. A service that handles one request by making 20 downstream calls is slow (those calls are often serial or partially serial) and expensive (20× the network and compute cost per user request).

Fixing fan-out patterns — through batching, caching, data denormalization, or rethinking the data model — usually improves both latency and cost at the same time.

Compression — Cheap and Often Forgotten

Enabling HTTP/gRPC response compression costs a few CPU cycles on the server and reduces payload size by 60–80% for typical JSON payloads. That means 60–80% less data transfer cost on every API call.

Enabling compression on Kafka messages, S3 objects, and database backups can similarly slash storage and transfer costs. The CPU cost of compression (and decompression) is almost always less than the storage and transfer savings.

Compression impact — rough example

Uncompressed JSON response:    12 KB average
After gzip compression:         2 KB average (83% reduction)

At 1,000 requests/second:
  Without compression:   12 KB × 1,000 × 86,400s = ~1 TB/day egress
  With compression:       2 KB × 1,000 × 86,400s = ~170 GB/day egress

Egress savings per day:  ~830 GB × $0.09/GB = $74.70/day = $2,240/month

This is from enabling one flag.

FinOps — Making Cost Visible

FinOps is the practice of bringing financial accountability to cloud spending. The core idea is simple: the engineers who make architectural decisions should be able to see the cost consequences of those decisions.

In practice, most organizations have a gap between the people who cause cost and the people who see it. Finance sees the bill. Engineering sees the architecture. The bill doesn't say "this is expensive because service X calls service Y 10 million times per day." It says "data transfer: $42,000."

Cost Allocation Tags — The Foundation

Every cloud resource should be tagged with at minimum: the team that owns it, the service it belongs to, and the environment (production, staging, dev). These tags flow into your billing reports and let you answer "how much does the recommendations service cost per month?"

Without tags, you have a total cloud bill. With tags, you have per-service cost attribution. Per-service cost attribution is what gives teams ownership over their spend.

Tagging strategy — minimum viable

team:          payments
service:       checkout-api
environment:   production
cost-center:   eng-checkout

Unit Economics — The Right Level of Abstraction

A raw dollar amount per service is useful. A cost-per-unit metric is more useful because it scales.

Cost per 1,000 API requests. Cost per active user per day. Cost per GB of data processed. These metrics let you answer questions like: "We grew 2× this quarter, but our cloud bill grew 3×. Something in our cost structure isn't scaling linearly." That's a question you can investigate and fix. "Our cloud bill is $500K this month" is a number you can only stare at.

Cost Anomaly Detection

AWS, GCP, and Azure all offer cost anomaly detection — alerts when a service's spend spikes unexpectedly. Setting these up takes 10 minutes. They catch things like: a misconfigured autoscaler that spun up 500 instances instead of 5; a new feature that does a full table scan on every request; a job that was supposed to run once but got triggered in a loop.

The goal is to catch a $10,000/day mistake on day one rather than day thirty. Without anomaly detection, you discover these problems when the monthly bill arrives.

The Cultural Side

FinOps only works if cost is part of the engineering culture, not a separate finance exercise. A few practices that help:

Add cost to architecture reviews. When someone proposes a design, ask: "What does this cost at 10× our current load?" It doesn't have to be exact. Back-of-envelope is fine. The goal is to make cost legible before a decision is made, not after.
Put cost metrics on team dashboards. Latency and error rate aren't the only things worth monitoring. Cost per request, normalized by traffic, should be a line on the team's dashboard. When it moves, someone should be curious why.
Celebrate cost savings like feature launches. A team that cuts their cloud bill by $30K/month while maintaining SLOs did something genuinely valuable. If the only thing that gets celebrated is shipping features, cost optimization will always be deprioritized.
Shared savings incentives. Some organizations redirect a portion of documented cloud savings back to the engineering team's budget. This creates a direct incentive that aligns engineering behavior with financial outcomes.

One Thing That Works

Run a quarterly "cost review" meeting alongside your quarterly planning. Look at the top 5 cost drivers, compare cost-per-unit metrics over time, and put one cost optimization item in every team's backlog. It doesn't have to be the top priority — just consistently present. Teams that do this regularly find 20–30% year-over-year efficiency gains from accumulated small improvements.

Cost at Design Time — A Mental Checklist

The best time to think about cost is before you build something. Not obsessively — you don't need to calculate the exact bill for every design. But a quick mental pass over the following questions surfaces the expensive surprises before they happen.

Where does data flow? Does it cross AZ boundaries? Region boundaries? Exit to the internet? Sketch the data flow and count how many hops involve billable transfer.
How big are the payloads? Are we sending full objects when we only need a few fields? Can we compress? Can we reduce payload size by design?
What's the read/write ratio and how does it scale with traffic? If user count doubles, does data transfer double? Does storage double? Does compute double? Non-linear scaling is worth catching early.
What access pattern does this data have? Will it be read frequently, infrequently, or almost never? Is it going into the right storage tier?
Are we building this or buying it? If building, what's the honest engineering time estimate, not just for initial build but for ongoing operations?
What happens at 10× scale? Cost structures that are tolerable at current scale can become catastrophic at 10× if they have bad scaling properties.

The Key Principle

Cost is determined by architecture, not by the billing team. The engineers who decide how data moves, where it's stored, and how services communicate are the people who decide the cloud bill — usually without knowing it. Making cost a first-class design input is the only way to avoid discovering expensive mistakes after they're too late to fix cheaply.

The Most Common Mistake

Treating open-source software as free because it has no license fee. The license is the smallest part of the cost. Engineering time to operate, oncall burden, and incident recovery are the real costs — and they're large enough to make managed services cheaper than self-managed ones in most cases, once you account for everything honestly.

Three Questions for Your Next Design Review

Draw the data flow of this design and mark every boundary where bytes are billed for transfer. What does this cost at 10× current traffic?
For every data store in this design: what is the realistic access pattern 12 months after launch? Is it in the right storage tier, or are we paying hot-storage prices for cold data?
If we're self-managing any piece of infrastructure in this design, what is the full TCO including engineering time, oncall burden, and incident cost — and does it still win over the managed alternative?

← Previous Chapter 29: Deployment and Release Engineering Next → Chapter 31: Storage Engine Internals