Appendix A — The Numbers Every Engineer Should Know

How to Use This Appendix

These numbers are approximate and will vary by hardware generation, cloud provider, and configuration. The point is not precision — it is the ratio between things. Knowing that a network round trip is roughly 100x slower than a memory access is far more useful than knowing either number exactly. When designing, reason in orders of magnitude first.

Latency Numbers

These are the building blocks of every performance argument. When someone says "we can't afford a network call in this hot path," this table is why.

Operation	Latency	Notes
CPU & Memory
L1 cache reference	~0.5 ns	The fastest you can read anything
L2 cache reference	~7 ns	14x L1
L3 cache reference	~40 ns	Still sub-microsecond
Main memory (RAM) access	~100 ns	200x L1. The first "slow" thing.
Mutex lock/unlock	~25 ns	Contention makes this much worse
Storage
NVMe SSD random read	~100 µs	1,000x RAM. Fast for storage, slow vs. memory.
NVMe SSD sequential read (1 MB)	~50 µs	Sequential is 2–10x faster than random
SATA SSD random read	~300 µs	3x slower than NVMe
HDD seek + read	~10 ms	100x slower than NVMe SSD. Near-dead for hot paths.
HDD sequential read (1 MB)	~1 ms	Sequential recovers a lot of HDD's deficit
Network
Same datacenter round trip	~0.5 ms	The baseline for same-region service calls
Cross-AZ round trip (same region)	~1–2 ms	Why cross-AZ replication has a cost
Cross-region round trip (US)	~40–80 ms	Speed of light across ~3,000 miles
Cross-continent round trip (US → EU)	~80–120 ms	Why CDNs exist. Irreducible by speed of light.
Cross-continent (US → Asia)	~150–200 ms	Anything interactive needs edge presence
TCP handshake overhead	1 RTT	Why connection pools matter at scale
TLS handshake overhead	1–2 RTT	TLS 1.3 reduced to 1 RTT (0 RTT resumption possible)
Operations at Scale
Compress 1 KB with snappy	~3 µs	Fast enough to almost always be worth it for network
Send 1 MB over 1 Gbps network	~8 ms	Bandwidth ≠ latency. Still feels slow.
Read 1 MB sequentially from memory	~250 µs	32x faster than network

The Key Ratios to Remember

RAM is ~1,000x faster than SSD. Caching works because this ratio is enormous.

SSD is ~100x faster than HDD. If you are still using HDD for a random-read workload, this is why your tail latencies look the way they do.

Same-DC network is ~5,000x slower than RAM. Every remote call pays this cost. Design accordingly.

Cross-region network adds 40–200 ms. No amount of optimization beats physics. Put data near users.

Throughput and Capacity Numbers

Throughput is different from latency — it tells you how many operations you can sustain over time, not how fast a single one is. Both matter, and they often pull in opposite directions.

Resource	Typical Throughput	Notes
Storage I/O
NVMe SSD sequential read/write	3–7 GB/s	Modern gen4/gen5 NVMe. Saturates PCIe lanes before most workloads.
NVMe SSD random IOPS (4K)	500K–1M IOPS	Random I/O is the bottleneck for most databases, not sequential
SATA SSD random IOPS (4K)	~100K IOPS	5–10x below NVMe for random
HDD random IOPS	~100–200 IOPS	The spinning platter bottleneck. 1,000x below NVMe.
HDD sequential throughput	~150–200 MB/s	HDD is still fine for sequential-heavy analytics workloads
Network
1 GbE NIC (typical VM)	~125 MB/s	Often the bottleneck on older cloud instance types
10 GbE NIC (modern cloud)	~1.25 GB/s	Standard for compute-optimized cloud instances
25/40 GbE (high-perf cloud)	3–5 GB/s	Large instances, network-optimized tiers
Single TCP connection	~1 Gbps	Limited by congestion window. Multiple connections needed for full bandwidth.
Databases (rough baselines)
PostgreSQL simple reads (single node)	~10K–50K QPS	Highly dependent on query complexity, indexes, RAM vs. disk
PostgreSQL writes (with fsync)	~1K–5K TPS	Durable writes are expensive. Batching helps significantly.
Redis (single node, simple ops)	~100K–1M ops/s	In-memory; pipelining pushes toward the higher end
Kafka (single broker, produce)	~100K–500K msg/s	Throughput vs. latency trade-off in producer config
S3 / object storage (single prefix)	~5,500 GET/s, 3,500 PUT/s	Per prefix. Sharding by prefix breaks this limit.
Memory & CPU
Memory bandwidth (modern server)	~50–200 GB/s	DDR5, multi-channel. Rarely the bottleneck unless doing heavy analytics.
Simple HTTP request (Go/Java, no DB)	~50K–200K req/s	Single core. Network becomes the limit well before CPU for most apps.

Availability and Downtime

"Nines" are how reliability is measured in practice. Knowing what they translate to in minutes per year changes every SLO conversation.

Availability	Downtime / Year	Downtime / Month	Downtime / Week	Typical Use Case
90% (1 nine)	36.5 days	73 hours	16.8 hours	Internal batch jobs, dev tools
95%	18.25 days	36.5 hours	8.4 hours	Non-critical internal services
99% (2 nines)	3.65 days	7.3 hours	1.68 hours	Internal dashboards, low-stakes APIs
99.5%	1.83 days	3.65 hours	50 min	Consumer apps where downtime is noticeable
99.9% (3 nines)	8.77 hours	43.8 min	10.1 min	Standard SaaS, internal critical services
99.95%	4.38 hours	21.9 min	5 min	High-value consumer products
99.99% (4 nines)	52.6 min	4.38 min	1 min	Payments, auth, core API infrastructure
99.999% (5 nines)	5.26 min	26.3 sec	6 sec	Telecom, financial clearing, life-critical systems

The Compounding Problem

If service A calls service B calls service C, and each has 99.9% availability, the combined availability is 99.9% × 99.9% × 99.9% = 99.7% — nearly three times more downtime than any individual service. Long synchronous call chains destroy your effective availability. This is why timeouts, circuit breakers, and graceful degradation are not optional.

Storage Scale Reference

Unit	Size	Relatable Example
1 KB	1,024 bytes	A short plain-text email
1 MB	~10⁶ bytes	A compressed photo; 1,000 average database rows
1 GB	~10⁹ bytes	~200,000 average database rows; a small relational DB
1 TB	~10¹² bytes	~200M database rows; a medium-sized production DB
1 PB	~10¹⁵ bytes	Large-scale data warehouse; ~200B rows; all photos on a mid-size social platform
1 EB	~10¹⁸ bytes	Hyperscaler territory. Total data created globally per day is ~2–3 EB (2024).

Cloud Cost Ratios (Approximate)

Absolute prices change constantly and vary by provider. These ratios are more stable and more useful for design decisions.

What You're Paying For	Relative Cost	Design Implication
Intra-AZ data transfer	Free	Keep hot paths in the same AZ when you can
Cross-AZ data transfer (same region)	~$0.01/GB	Small but non-zero. Replication and read replicas have a cost.
Egress to internet (cloud → user)	~$0.08–0.12/GB	8–12x cross-AZ. CDNs reduce this significantly.
Cross-region data transfer	~$0.02–0.08/GB	Multi-region architectures have a real data transfer line item
Object storage (S3/GCS)	~$0.02/GB/month	Cheapest durable storage. Archive tiers go 10x lower.
Block storage (EBS gp3)	~$0.08/GB/month	4x object storage. But low latency, so worth it for databases.
Managed database storage	~$0.12–0.25/GB/month	6–12x object storage. Includes replication cost.
In-memory (ElastiCache/Redis)	~$0.10–0.50/GB/month	5–25x object storage. Only cache what needs it.

The Data Transfer Tax

Data transfer (especially egress) is frequently the largest and most surprising cloud bill line item for data-heavy systems. A system that moves 100 TB of data per month to end users pays ~$8,000–12,000/month in egress alone, before compute or storage. Design your data locality with this in mind: keep processing close to storage, use CDNs for user-facing content, and question any architecture that requires large cross-region data movement.