The Problem: You Can't Do This Anymore
Imagine you're building an e-commerce system. When a customer places an order, three things need to happen:
- Create an order record in the Orders database
- Deduct the items from inventory in the Inventory database
- Charge the customer's payment method via the Payment service
If all three succeed, great. But what if inventory deduction succeeds, then the payment fails? You now have inventory reserved for an order that was never paid for. What if the order is created, the payment goes through, but your server crashes before it can tell the inventory service? Now you've charged the customer for items that are still shown as available.
In a single monolithic application with one database, you would wrap all three in a transaction. The database gives you an atomic unit — either all of it happens, or none of it does. It's one of the best features ever invented for writing correct software.
But in a microservices world, these three operations live in separate services, owned by separate teams, backed by separate databases. There is no shared database to wrap them in. The transaction boundary stops at the edge of each service's database.
This is the problem this chapter is about.
Distributed transactions are not a technical problem waiting for a clever solution. They are a fundamental tension between two things we want: independent, autonomous services, and operations that span multiple services to look atomic. You can't have both fully. Every solution in this chapter is a different way of accepting that trade-off.
A Quick Recap of ACID
Before we look at what breaks, let's be precise about what ACID gives us inside a single database:
- Atomicity — either all operations in the transaction complete, or none do. No partial state.
- Consistency — the database moves from one valid state to another. Constraints are always satisfied.
- Isolation — concurrent transactions don't see each other's partial state. It's as if they ran one after another.
- Durability — once committed, the data survives crashes.
When you move to multiple services, each service can still give you ACID for its own data. But there's no higher-level system giving you ACID across all of them together. That's what we're trying to approximate.
Two-Phase Commit: The Traditional Answer
Two-phase commit (2PC) is the classical protocol for making a transaction atomic across multiple participants. It was invented decades ago, it is still used today, and it comes with well-known problems that are worth understanding in detail.
How it works
The protocol has two roles: a coordinator (usually the service that initiated the transaction) and multiple participants (each service or database involved).
Each participant, when it says "Yes, I'm ready", has to guarantee that it can commit — it writes the pending changes to a durable log. It is now locked. It cannot release those locks until it hears back from the coordinator.
The blocking problem
Here is the fatal issue with 2PC. What happens if the coordinator crashes after it receives all the "Yes" votes but before it sends the "Commit" command?
The participants cannot decide on their own. They can't commit without the coordinator's instruction (because maybe the coordinator decided to abort). They can't abort either (because maybe the coordinator decided to commit and some other participant already did). They are stuck.
This is called the in-doubt period. During this time, rows are locked, other transactions trying to access those rows will block or time out, and your system grinds to a halt.
The blocking problem is not just theoretical. Any system using 2PC at high volume will periodically have a coordinator crash. When this happens, all participants hold locks until the coordinator recovers. For systems that handle thousands of transactions per second, even a 30-second coordinator restart can queue up a backlog that takes minutes to drain. This is why companies like Google, Amazon, and Netflix built distributed systems that avoid 2PC entirely for high-throughput paths.
Three-phase commit: a partial fix
Three-phase commit (3PC) adds a "Pre-commit" phase between the vote and the commit, which allows participants to figure out what to do if the coordinator crashes — but only in networks where you can assume message delivery times are bounded. In practice, real networks are asynchronous and have no such guarantee, which is why 3PC is rarely used in production systems. It solves a theoretical problem while creating a more complex system that fails in different ways.
The Saga Pattern: A Different Philosophy
Rather than trying to make distributed operations look atomic by holding locks, the Saga pattern takes the opposite approach: accept that failures will happen, and plan for how to recover from them.
A Saga breaks a long-running business transaction into a sequence of smaller, local transactions. Each step touches one service. If a step fails, you don't roll back like a database transaction — you run a compensating transaction for each step that already succeeded, undoing its effects.
The key difference from 2PC: there are no global locks. Each step commits immediately and releases its resources. The next step proceeds without waiting for everyone else. This is dramatically more scalable.
What compensating transactions really are
A compensating transaction is not the same as a database rollback. A database rollback is as if the original operation never happened — it's invisible. A compensating transaction is a new operation that logically reverses the effect of a previous one, but the previous one did happen and may have been visible to others in the meantime.
Consider: if you created an order (Step 1) and then need to cancel it (compensating transaction), the order might have been visible to a warehouse worker for 30 seconds. The compensation creates a cancellation record — it doesn't erase the original order from history. This is important to understand when designing your data model.
Compensating transactions are a business-level concept, not a database concept. "Cancel the order" is very different from "pretend the order never existed." This means you need to think carefully about what "undoing" a step means in your domain — and accept that some operations are hard or impossible to fully compensate (for example, sending an email, or printing a shipping label).
The isolation problem with Sagas
Remember the I in ACID — isolation. Sagas give you atomicity (eventually, through compensation), but they do not give you isolation. While your Saga is running, another request can read the intermediate state of your data.
In our order example: after Step 1 (order created) and Step 2 (inventory reserved), but before Step 3 (payment), another request might read the inventory count and see the items as reserved. If Step 3 then fails and we compensate Step 2 (releasing the reservation), that other request saw a state that no longer exists. This is called a dirty read.
There are patterns to manage this — you can use semantic locks (marking a record as "pending" until the Saga completes), or design your UI to tolerate eventual consistency. But don't kid yourself: this is a real trade-off that you need to design around, not a detail to paper over.
Choreography vs. Orchestration
Once you decide to use Sagas, you have a choice about how the steps are coordinated. There are two fundamentally different approaches.
Choreography: Each service reacts to events
In choreography, there is no central coordinator. Each service listens for events on a message bus and decides what to do next. When the Orders service creates an order, it publishes an OrderCreated event. The Inventory service listens for this event and reserves stock. When that's done, it publishes an InventoryReserved event. The Payment service listens for that and charges the customer. And so on.
Choreography is simple to start with. Each service is autonomous. There's no single point of failure for the coordination itself. Adding a new step is as easy as adding a new listener.
But it has a serious problem as the system grows: the overall flow becomes invisible. No single place in your codebase describes "here is what happens when an order is placed." The logic is distributed across five event handlers in five different services. When something goes wrong — and it will — you'll be tracing events across multiple logs, multiple services, trying to reconstruct what happened.
Orchestration: A central coordinator drives the flow
In orchestration, one service — typically called a Saga orchestrator or process manager — is responsible for driving the entire workflow. It calls each service in turn, tracks progress, and decides what to do if something fails.
Orchestration makes the flow visible and explicit. When something goes wrong, you look at the orchestrator's state to see exactly which step failed. It's much easier to debug, monitor, and reason about.
The downside: the orchestrator is a new moving part in your system. It needs to be reliable and persistent (its state needs to survive crashes). It can also become a coupling point — if your orchestrator needs to know the API of every participant service, changes to those services ripple into the orchestrator.
| Dimension | Choreography | Orchestration |
|---|---|---|
| Flow visibility | Invisible — spread across services harder | Explicit — defined in one place easier |
| Coupling | Services only know about events loose | Orchestrator knows all services tighter |
| Debugging | Trace events across many services hard | Check orchestrator state easy |
| Single point of failure | No coordinator to fail resilient | Orchestrator must be highly reliable requires care |
| Adding steps | Add a new listener easy | Update the orchestrator centralized change |
| Best for | Simple, linear flows with few steps | Complex flows, long-running workflows, many failure cases |
In practice, most large systems end up with orchestration for their critical business flows, because being able to see and debug the overall state of a transaction is worth the coupling cost. Choreography tends to work well for simpler, fire-and-forget notification flows.
Idempotency: The Foundation Everything Rests On
Whether you use Sagas, 2PC, or anything else — the entire correctness of your distributed transaction system depends on one property: idempotency.
An operation is idempotent if running it multiple times with the same input has the same effect as running it once. The canonical example is setting a value: SET balance = 100 is idempotent. Running it ten times gives the same result as running it once. In contrast, ADD 10 TO balance is not idempotent — running it ten times gives a very different result.
Why does this matter so much? Because in distributed systems, you cannot reliably know if an operation was executed. A request goes out; the network times out. Did the other service receive it and process it before the timeout? Or did it never arrive? You don't know. So you retry. If the operation is not idempotent, that retry might double-charge the customer, reserve inventory twice, or create two orders.
Idempotency keys
The standard pattern for making operations idempotent is to assign each operation a unique key (usually a UUID generated by the caller) and have the server record which keys it has already processed.
A few practical details to get right:
- The key must be generated by the caller, not the server. If the server generates the key, a timed-out request has no key to retry with.
- Store the result, not just the key. The retry must get back the same response as the original call. Just storing "I've seen this key" is not enough — you need to return the original charge ID, transaction status, etc.
- The deduplication window matters. How long do you keep idempotency records? Forever is expensive. 24 hours might be fine for payment retries but not for a background job that runs weekly. Choose based on your retry pattern.
- The key scope matters. An idempotency key only deduplicates within one service. If you retry an entire Saga, each step needs its own idempotency key — typically derived from the Saga's ID plus the step number.
A clean pattern: generate one UUID for the entire Saga (e.g., saga_id = "abc123"). For each step, derive the idempotency key deterministically: Step 1 uses "abc123-step-1", Step 2 uses "abc123-step-2". This way, if you replay a Saga from the beginning after a crash, each step gets the same key it had before — and won't execute twice.
The Dual-Write Problem
Here's a specific failure mode that trips up most teams when they first build event-driven systems. It's subtle, and the naive solution doesn't work.
Suppose your Orders service needs to do two things when creating an order:
- Save the order to its database
- Publish an
OrderCreatedevent to the message queue (so the Inventory service can react)
These are two different systems. You cannot do both atomically. So what order do you do them in, and what happens when the process crashes between the two?
Neither ordering is safe. This is not a bug you can fix by being clever. It's a fundamental consequence of having two separate systems with no shared transaction boundary.
The Outbox Pattern
The Outbox pattern solves this by turning two operations into one. Instead of writing to the database and separately publishing to the queue, you write to the database and write the event to a special outbox table in the same database transaction. A separate process (the Outbox Relay) then reads from the outbox table and publishes to the message queue.
The Outbox pattern gives you a strong guarantee: if the order is saved, the event will eventually be published. The relay might deliver the event more than once (at-least-once delivery), but never zero times. This is why consumers need to be idempotent — they'll handle the duplicate just fine.
A variation of this is the Transaction Log Tailing approach (also called Change Data Capture, or CDC). Instead of a separate outbox table, you read the database's replication log directly — the stream of all changes made to the database. Tools like Debezium do this for PostgreSQL, MySQL, and others. The advantage is you don't need to add an outbox table or change your application code. The disadvantage is operational complexity — you're now depending on the internals of your database's replication format.
Write the event to your own database first (in the same transaction as your business data), then let a relay process publish it to the external queue asynchronously. You trade immediate publication for guaranteed eventual publication.
Putting It Together: What to Use When
By now you have several tools. The question is how to choose between them.
Use a local transaction (no distributed coordination) when you can
Before reaching for any of the above patterns, ask: can I restructure my data so this operation only touches one database? This is not always possible, but it's more often possible than engineers think. The best distributed transaction is the one you don't need.
For example, if your Orders and Inventory data live in the same database (even if exposed through different services), you can use a local transaction. The services can be separately deployed and owned — they just share a database. This is sometimes called the Shared Database pattern, and it's unfairly stigmatized. For teams that are early in service decomposition, it's often the pragmatic right answer.
Use 2PC for low-volume, high-value operations across few participants
If you're processing hundreds of transactions per day (not millions), the blocking behavior of 2PC is unlikely to matter. Financial settlement systems, inter-bank transfers, and regulatory reporting pipelines often use 2PC or XA transactions because the strong guarantee is worth the overhead. The key condition: few participants (2-3), low volume, and operations where correctness is more important than throughput.
Use Sagas for high-volume, multi-step business workflows
This is the right default for most modern distributed systems. Sagas work at any throughput, don't block on coordinator failures, and map naturally to business workflows that already have compensation logic ("cancel the order", "refund the payment"). Use orchestration when the flow has more than 3-4 steps or has complex failure handling. Use choreography for simpler, linear flows.
Always use the Outbox pattern when publishing events alongside database writes
This is not optional. The dual-write problem will eventually hit you in production if you don't address it, and the consequences (silent data inconsistency) are the worst kind of bug — hard to detect and hard to fix after the fact.
A Real-World Example: Booking a Flight
Let's make this concrete with a worked example. You're building a flight booking system. When a customer books a flight, you need to:
- Reserve the seat (Flights service)
- Create the booking record (Bookings service)
- Charge the customer (Payments service)
- Send a confirmation email (Notifications service)
Step 4 is important: you cannot compensate for sending an email. Once it's sent, it's sent. This is called an irreversible operation, and it shapes the design significantly.
The right approach: put the email notification last, after everything else has committed successfully. Never put irreversible operations in the middle of a Saga. This way, if anything fails before the email step, you can compensate cleanly. You only send the email once you're certain the booking is complete.
Reserve seat (reversible)
Mark the seat as "pending" (not "booked"). Compensation: mark as available again. Easily reversible.
Create booking record (reversible)
Create a booking in "pending" state. Compensation: delete or mark as "cancelled". Reversible.
Charge payment (semi-reversible)
Charge the card. Compensation: issue a refund. Not instant — refunds take days — but logically reversible. This is the hardest step to compensate, so put it last among reversible operations.
Confirm seat + booking (reversible)
Flip both from "pending" to "confirmed". Compensation: flip back. Still reversible.
Send confirmation email (irreversible — do this last)
Send the email only after all prior steps have committed. If this step fails, you don't need to compensate — you just retry until the email goes out. The booking is already confirmed.
"Put irreversible operations last. Never put them in the middle, where a later failure would require you to compensate for something you can't undo."
Failure Handling: The Part Most Teams Skip
Building the happy path of a Saga is not that hard. The hard part is handling every failure scenario explicitly. Here are the failures you must think through:
Step failure
A step returns an error. This is the easy case. Run the compensations. Log the failure. Alert if necessary.
Step timeout
You called the Payments service and it didn't respond within 5 seconds. Do you retry? What if it actually succeeded and the response got lost? This is why idempotency is non-negotiable — retry with the same idempotency key and you'll get the right answer either way.
Orchestrator crash mid-Saga
The orchestrator itself can crash. This means it must persist its state durably. On restart, it must be able to resume a Saga from exactly the step where it left off, or detect that a step was in-flight and retry it. If you use a workflow engine (like Temporal, AWS Step Functions, or Conductor), this is handled for you. If you build it yourself, you need a durable state store and careful recovery logic.
Compensation failure
What if a compensating transaction fails? For example, you're trying to release a seat reservation, but the Flights service is down. Now you have a booking that was cancelled in the Bookings service but still appears reserved in the Flights service. This is called a compensation failure, and it requires manual intervention or a dead-letter queue where failed compensations are retried until they succeed.
You'll retry failed compensations. That means releasing an inventory reservation twice must be safe. Cancelling a booking twice must be safe. Every compensating transaction has the same idempotency requirement as the forward transaction. Build this in from the start, not as an afterthought.
The Bottom Line
Distributed transactions are fundamentally hard because you're trying to make multiple independent systems behave as one. Every solution is a compromise:
- 2PC gives you strong atomicity but blocks on coordinator failure
- Sagas give you scalable, non-blocking coordination but sacrifice isolation
- Choreography gives you loose coupling but makes the flow invisible
- Orchestration makes the flow visible but introduces a central coordinator
- Outbox solves dual-write but requires a relay process and idempotent consumers
There is no option on this list that has no downsides. The skill is not finding the perfect option — it's being honest about the trade-offs and choosing the one that fits your system's actual requirements.
The most important mindset shift: stop thinking about "how do I make this atomic?" and start thinking about "how do I make this recoverable?" Recoverability — being able to detect and correct inconsistency — is more achievable and more durable than trying to prevent inconsistency from ever occurring in the first place.