Appendix A

The Numbers Every Engineer Should Know

A reference card of latency, throughput, storage, and availability numbers. You do not need to memorize all of these precisely — you need to know the order of magnitude so you can immediately recognize when a design is asking for the impossible, or when an optimization is in the wrong place.

How to Use This Appendix

These numbers are approximate and will vary by hardware generation, cloud provider, and configuration. The point is not precision — it is the ratio between things. Knowing that a network round trip is roughly 100x slower than a memory access is far more useful than knowing either number exactly. When designing, reason in orders of magnitude first.

Latency Numbers

These are the building blocks of every performance argument. When someone says "we can't afford a network call in this hot path," this table is why.

Operation Latency Notes
CPU & Memory
L1 cache reference ~0.5 ns The fastest you can read anything
L2 cache reference ~7 ns 14x L1
L3 cache reference ~40 ns Still sub-microsecond
Main memory (RAM) access ~100 ns 200x L1. The first "slow" thing.
Mutex lock/unlock ~25 ns Contention makes this much worse
Storage
NVMe SSD random read ~100 µs 1,000x RAM. Fast for storage, slow vs. memory.
NVMe SSD sequential read (1 MB) ~50 µs Sequential is 2–10x faster than random
SATA SSD random read ~300 µs 3x slower than NVMe
HDD seek + read ~10 ms 100x slower than NVMe SSD. Near-dead for hot paths.
HDD sequential read (1 MB) ~1 ms Sequential recovers a lot of HDD's deficit
Network
Same datacenter round trip ~0.5 ms The baseline for same-region service calls
Cross-AZ round trip (same region) ~1–2 ms Why cross-AZ replication has a cost
Cross-region round trip (US) ~40–80 ms Speed of light across ~3,000 miles
Cross-continent round trip (US → EU) ~80–120 ms Why CDNs exist. Irreducible by speed of light.
Cross-continent (US → Asia) ~150–200 ms Anything interactive needs edge presence
TCP handshake overhead 1 RTT Why connection pools matter at scale
TLS handshake overhead 1–2 RTT TLS 1.3 reduced to 1 RTT (0 RTT resumption possible)
Operations at Scale
Compress 1 KB with snappy ~3 µs Fast enough to almost always be worth it for network
Send 1 MB over 1 Gbps network ~8 ms Bandwidth ≠ latency. Still feels slow.
Read 1 MB sequentially from memory ~250 µs 32x faster than network
The Key Ratios to Remember

RAM is ~1,000x faster than SSD. Caching works because this ratio is enormous.

SSD is ~100x faster than HDD. If you are still using HDD for a random-read workload, this is why your tail latencies look the way they do.

Same-DC network is ~5,000x slower than RAM. Every remote call pays this cost. Design accordingly.

Cross-region network adds 40–200 ms. No amount of optimization beats physics. Put data near users.


Throughput and Capacity Numbers

Throughput is different from latency — it tells you how many operations you can sustain over time, not how fast a single one is. Both matter, and they often pull in opposite directions.

Resource Typical Throughput Notes
Storage I/O
NVMe SSD sequential read/write 3–7 GB/s Modern gen4/gen5 NVMe. Saturates PCIe lanes before most workloads.
NVMe SSD random IOPS (4K) 500K–1M IOPS Random I/O is the bottleneck for most databases, not sequential
SATA SSD random IOPS (4K) ~100K IOPS 5–10x below NVMe for random
HDD random IOPS ~100–200 IOPS The spinning platter bottleneck. 1,000x below NVMe.
HDD sequential throughput ~150–200 MB/s HDD is still fine for sequential-heavy analytics workloads
Network
1 GbE NIC (typical VM) ~125 MB/s Often the bottleneck on older cloud instance types
10 GbE NIC (modern cloud) ~1.25 GB/s Standard for compute-optimized cloud instances
25/40 GbE (high-perf cloud) 3–5 GB/s Large instances, network-optimized tiers
Single TCP connection ~1 Gbps Limited by congestion window. Multiple connections needed for full bandwidth.
Databases (rough baselines)
PostgreSQL simple reads (single node) ~10K–50K QPS Highly dependent on query complexity, indexes, RAM vs. disk
PostgreSQL writes (with fsync) ~1K–5K TPS Durable writes are expensive. Batching helps significantly.
Redis (single node, simple ops) ~100K–1M ops/s In-memory; pipelining pushes toward the higher end
Kafka (single broker, produce) ~100K–500K msg/s Throughput vs. latency trade-off in producer config
S3 / object storage (single prefix) ~5,500 GET/s, 3,500 PUT/s Per prefix. Sharding by prefix breaks this limit.
Memory & CPU
Memory bandwidth (modern server) ~50–200 GB/s DDR5, multi-channel. Rarely the bottleneck unless doing heavy analytics.
Simple HTTP request (Go/Java, no DB) ~50K–200K req/s Single core. Network becomes the limit well before CPU for most apps.

Availability and Downtime

"Nines" are how reliability is measured in practice. Knowing what they translate to in minutes per year changes every SLO conversation.

Availability Downtime / Year Downtime / Month Downtime / Week Typical Use Case
90% (1 nine) 36.5 days 73 hours 16.8 hours Internal batch jobs, dev tools
95% 18.25 days 36.5 hours 8.4 hours Non-critical internal services
99% (2 nines) 3.65 days 7.3 hours 1.68 hours Internal dashboards, low-stakes APIs
99.5% 1.83 days 3.65 hours 50 min Consumer apps where downtime is noticeable
99.9% (3 nines) 8.77 hours 43.8 min 10.1 min Standard SaaS, internal critical services
99.95% 4.38 hours 21.9 min 5 min High-value consumer products
99.99% (4 nines) 52.6 min 4.38 min 1 min Payments, auth, core API infrastructure
99.999% (5 nines) 5.26 min 26.3 sec 6 sec Telecom, financial clearing, life-critical systems
The Compounding Problem

If service A calls service B calls service C, and each has 99.9% availability, the combined availability is 99.9% × 99.9% × 99.9% = 99.7% — nearly three times more downtime than any individual service. Long synchronous call chains destroy your effective availability. This is why timeouts, circuit breakers, and graceful degradation are not optional.


Storage Scale Reference

Unit Size Relatable Example
1 KB1,024 bytesA short plain-text email
1 MB~10⁶ bytesA compressed photo; 1,000 average database rows
1 GB~10⁹ bytes~200,000 average database rows; a small relational DB
1 TB~10¹² bytes~200M database rows; a medium-sized production DB
1 PB~10¹⁵ bytesLarge-scale data warehouse; ~200B rows; all photos on a mid-size social platform
1 EB~10¹⁸ bytesHyperscaler territory. Total data created globally per day is ~2–3 EB (2024).

Cloud Cost Ratios (Approximate)

Absolute prices change constantly and vary by provider. These ratios are more stable and more useful for design decisions.

What You're Paying For Relative Cost Design Implication
Intra-AZ data transfer Free Keep hot paths in the same AZ when you can
Cross-AZ data transfer (same region) ~$0.01/GB Small but non-zero. Replication and read replicas have a cost.
Egress to internet (cloud → user) ~$0.08–0.12/GB 8–12x cross-AZ. CDNs reduce this significantly.
Cross-region data transfer ~$0.02–0.08/GB Multi-region architectures have a real data transfer line item
Object storage (S3/GCS) ~$0.02/GB/month Cheapest durable storage. Archive tiers go 10x lower.
Block storage (EBS gp3) ~$0.08/GB/month 4x object storage. But low latency, so worth it for databases.
Managed database storage ~$0.12–0.25/GB/month 6–12x object storage. Includes replication cost.
In-memory (ElastiCache/Redis) ~$0.10–0.50/GB/month 5–25x object storage. Only cache what needs it.
The Data Transfer Tax

Data transfer (especially egress) is frequently the largest and most surprising cloud bill line item for data-heavy systems. A system that moves 100 TB of data per month to end users pays ~$8,000–12,000/month in egress alone, before compute or storage. Design your data locality with this in mind: keep processing close to storage, use CDNs for user-facing content, and question any architecture that requires large cross-region data movement.

Table of Contents Appendix B: Estimation Examples →