A practitioner's guide to building scalable, fault-tolerant, reliable, maintainable, and operationally efficient systems — focused on judgment, not just mechanisms.
Most distributed systems books teach you mechanisms — how Raft works, what consistent hashing is. This book teaches you judgment — when to use which mechanism, what you're trading away, and how to reason about systems you've never seen before. The goal is to make you dangerous with a whiteboard, not just with a textbook.
Every distributed systems decision is a negotiation between consistency, availability, latency, throughput, and simplicity. There is only the right design for your constraints.
Every week, the project should be less ambiguous than the week before. If the list of unknowns keeps growing, something is structurally wrong.
Operational complexity, on-call burden, knowledge transfer cost — these are system properties, not afterthoughts. A system that only works when its creator is present is badly designed.
Each chapter ends with: the key principle in one sentence, the most common mistake, and three questions for your next design review.
Undocumented assumptions are liabilities. Make every constraint, every trade-off, every failure mode visible — in design docs, in code, in runbooks, in conversations.
You will not enumerate all failure modes. Design systems that degrade gracefully under unknown failures, not just the ones you planned for.
Operational complexity, on-call burden, knowledge transfer cost — these are system properties, not afterthoughts. A system that works perfectly when its creator is available and fails mysteriously when they're not is a badly designed system.
The Numbers Every Engineer Should Know — latency, throughput, availability, and cost reference card
BBack-of-Envelope Estimation — 5 worked examples: URL shortener, social feed, video platform, location service, rate limiter
CRecommended Reading — books, foundational papers, and engineering essays that changed how the field thinks
DGlossary of Precise Terms — because "consistency" means four different things depending on who's talking