What's in this chapter
- Why storing state is the wrong default — and what storing events gives you instead
- The append-only log as the foundation for distributed systems (the Kafka worldview)
- Event sourcing in depth: how it works, where it breaks down
- CQRS: why the read model and write model should be separate
- The projections problem: rebuilding read models from a long event log
- The dark side: when event sourcing makes things much harder
- When to use it and — equally important — when not to
Key Learnings — If You Only Read This Section
Events are facts, state is a snapshot. An event says "this happened." State says "this is true right now." Events are immutable; state is derived. If you have the events, you can always reconstruct the state. If you only have the state, the history is gone forever.
The append-only log is the most durable data structure. It's a single, ordered sequence of facts. No updates, no deletes. Kafka, database WALs, and git commits are all the same idea.
Event sourcing gives you a time machine and an audit log for free. Because you keep every event, you can replay history, reproduce bugs, and ask questions about the past that you didn't think to ask when you built the system.
CQRS separates reading from writing. You write events to an append-only log. You read from a separate projection (a view built from those events). This sounds like extra work — and it is — but it lets you optimize reads and writes completely independently.
Rebuilding a projection from millions of events is painfully slow without snapshots. Snapshots are periodic checkpoints of derived state so you don't replay the full history every time. They add operational complexity.
Event schemas are the hardest thing to change in an event-sourced system. You can't update old events. If your event structure was wrong, you're stuck with it forever — or you build a migration path that's more complex than the original system.
Most applications don't need event sourcing. A CRUD app with a PostgreSQL database is fine. Event sourcing is valuable when audit trails, temporal queries, event-driven integration, or complex domain logic genuinely justify the overhead.
The Problem with Storing State
Imagine you're building an online bank. A user's account has a balance. The simplest thing is to store that balance directly — one row in a table, one column called balance. When money comes in, you update it. When money goes out, you update it again.
Now your customer calls and says: "I think there was an unauthorized charge on my account last Tuesday." What do you do? You look at the current balance. But that tells you nothing about last Tuesday. You've overwritten the past. It's gone.
This is the fundamental problem with mutable state. Every time you update a record, you destroy the information about what it was before. The current value is all you have.
Banks have always known this. Their ledger is not a single row that gets updated. It's a list of transactions — an append-only record of every credit and debit. The balance is not stored; it is computed by summing the transactions. The transactions are the truth. The balance is a derived view of that truth.
The traditional approach asks: what is the current state? Event sourcing asks: what happened, in order? State is just the current answer to replaying all the events from the beginning.
This idea — recording a sequence of events instead of updating state in place — is the foundation of event sourcing. And once you see it, you'll notice it everywhere: your database's write-ahead log, git commits, accounting ledgers, even the undo history in a text editor.
The Append-Only Log
Before we talk about event sourcing as an application pattern, let's talk about the data structure underneath it: the append-only log.
A log is the simplest possible data structure. You can only do one thing to it: add a new entry at the end. You cannot update an existing entry. You cannot delete one. Entries are ordered and numbered. That's it.
This structure is deceptively powerful. Because entries are immutable and ordered, the log has a property that most data structures don't: it is the source of truth, not a reflection of it. Every other view of the data — a balance, a dashboard, a search index — is a derived view of the log.
Why Logs Show Up Everywhere
If you look closely, logs are underneath almost every reliable system:
- A database write-ahead log (WAL) records every change before it's applied to the data files. If the database crashes, it replays the WAL to recover. The WAL is the truth; the data files are a cached view.
- Git is an append-only log of commits. Your working directory is a derived view of replaying all commits up to HEAD.
- Kafka is a distributed, persistent, append-only log that multiple consumers can read at their own pace.
- A blockchain is an append-only log with cryptographic proofs linking entries.
The pattern is the same every time: the log records what happened; everything else is computed from the log. This is not a coincidence. It reflects something true about how reliable systems should be built.
Jay Kreps (one of Kafka's creators) wrote a now-famous post called "The Log: What every software engineer should know." His core observation: the log is the canonical record of events in a distributed system. Everything else — databases, caches, search indexes — is a materialized view of the log.
Event Sourcing
Event sourcing takes the log idea and applies it to your application's domain. Instead of storing the current state of your domain objects, you store the sequence of events that led to that state.
Let's make this concrete with an e-commerce order.
Traditional vs. Event-Sourced
In a traditional system, you have an orders table. The row for order #1234 might look like:
You can see that the order was shipped. You cannot see that it was placed, then modified, then paid for, then dispatched. All that history is gone — unless you explicitly built an audit log separately, which most teams don't.
In an event-sourced system, you store the events:
To get the current state, you replay these events through a function that accumulates them:
The apply function is pure — given the same sequence of events, it always produces the same state. This is important, as we'll see.
What Event Sourcing Gives You
The benefits are real and they're significant in the right context.
Complete audit trail. Because you keep every event, you have a complete, immutable record of everything that ever happened to every entity. This is not a secondary audit table bolted on — it's the primary data. Regulators love this. Security teams love this.
Temporal queries. You can ask "what did this order look like at 2pm on Tuesday?" by replaying events up to that timestamp. With mutable state, this question is unanswerable.
Debugging is fundamentally different. When a bug is reported, you can replay the exact sequence of events that led to it. You're not trying to reconstruct what happened from log messages and intuition — you have the actual events. You can run them through a corrected version of your code and see what the correct output would have been.
New read models from old data. If you decide 18 months into your product that you want a new dashboard or analytics view, you can build a new projection by replaying all your historical events through new code. You're not limited to the views you thought to build when you started.
Integration via events. Other services can subscribe to your event stream and build their own views. They don't need to query your database. They listen to what happened.
CQRS: Separating Reads from Writes
Event sourcing almost always comes packaged with another pattern called CQRS — Command Query Responsibility Segregation. The name is intimidating but the idea is simple: the model you use to change data doesn't have to be the same model you use to read data.
In a traditional system, you read from and write to the same table. This means the schema has to serve both purposes. Sometimes that's fine. But often the queries you want to run don't match the shape of the data you're writing.
With event sourcing, the write side is simple: validate a command, produce one or more events, append them to the log. The read side is a separate concern: take the events and build a projection — a view of the data optimized for querying.
Each projection is a consumer that reads the event log and builds its own read model — a database table, a search index, a cache, whatever the query pattern requires. If you need a new query, you add a new projection. The write side doesn't change.
The read model and write model can use completely different databases. Your writes go into an event log. One projection builds a relational DB for transactional queries. Another builds an Elasticsearch index for full-text search. Another feeds a data warehouse for analytics. They all come from the same events.
The Consistency Trade-off in CQRS
There's a cost. Because projections are built asynchronously from the event log, they are eventually consistent. If a user places an order and immediately asks "what is my order status?", the projection might not have processed the OrderPlaced event yet.
This is a real problem for user-facing features and many teams underestimate it. There are workarounds — reading directly from the event log for the most recent state, using a version number to detect stale reads — but they add complexity. The simplicity of "read the same thing you just wrote" is gone.
The Projections Problem
This is the part that most introductory articles about event sourcing skip. It's where the model gets hard.
A projection is built by reading the event log from the beginning and applying each event to build up the read model. If you have 100 events, this is instant. If you have 100 million events, this takes a long time. As your system ages, rebuilding projections becomes slower and slower.
Why You Need to Rebuild Projections
You will need to rebuild projections more often than you expect. Common reasons:
- You found a bug in your projection code and need to recompute from clean data
- You changed the schema of the projection
- You're adding a new projection that needs to backfill historical data
- The projection database became corrupted
- You're migrating to a new infrastructure
When your event log has accumulated years of history, "replay from the beginning" is not a fast operation. You might be looking at hours or days of rebuild time.
Snapshots: The Mitigation
The standard solution is snapshotting. Periodically, you save the current state of the projection as a checkpoint. When you need to rebuild, you start from the most recent snapshot rather than from the very beginning of the log.
Snapshots work well, but they add operational complexity: you need to store them, version them, and invalidate them when the projection code changes. If you change the projection code, a snapshot built with the old code is now wrong — you can't use it, you have to go back to a pre-snapshot event and rebuild from there.
Running Old and New Projections in Parallel
When you change projection code, you can't just update the running projection. You need to build the new projection alongside the old one, verify it's correct, and then atomically cut over reads from old to new. This is a deployment challenge that most teams don't plan for until they're doing it for the first time.
Projection rebuilds in production are stressful. If your event log is in Kafka, rebuilding a large projection hammers the Kafka cluster and can affect the latency of live event processing. You need to rebuild in a separate consumer group, with rate limiting, tested carefully. This is unglamorous work that takes days to get right.
Event Schema Evolution — The Hardest Part
Here's the part that bites almost every team that adopts event sourcing: you cannot change events that already happened.
If you shipped an OrderPlaced event three years ago with a certain schema, those events exist in your log. They are immutable. Your projection code has to handle them. Forever.
When your requirements change — and they always do — you have a few options, none of them free:
Strategies for Evolving Event Schemas
Upcasting. When reading an old event, transform it to the new schema on the way into your projection. The raw event is unchanged; you add a translation layer that knows how to convert old formats to new ones. This works but every old event format adds permanent code complexity.
Versioning events. When the schema changes incompatibly, create a new event type. Instead of changing OrderPlaced, introduce OrderPlacedV2. Your projection handles both. Over time you accumulate many versions. This is technically clean but verbose.
Copy-transform. Write a migration that reads the old event log, transforms the events into the new schema, and writes them to a new log. Then cut over to the new log. Expensive, operationally risky, but produces a clean log. Only practical for breaking changes where you want to start fresh.
| Strategy | Complexity | Best For | Watch Out For |
|---|---|---|---|
| Upcasting | Low upfront, accumulates over time | Additive changes (new optional fields) | Upcast chains get deep over years |
| Event versioning | Medium | Significant schema changes | Many version cases in every handler |
| Copy-transform migration | High | Breaking changes, full schema rewrites | Risky cutover, expensive storage |
The lesson is not that schema evolution is impossible — it's that you need to treat your event schema as a permanent public API, not an internal implementation detail. Before you publish an event, ask: "Would I be comfortable maintaining backward compatibility with this schema for the next five years?" Because that's what you're committing to.
The Dark Side: When Event Sourcing Hurts
Event sourcing has genuine costs that get glossed over in enthusiastic blog posts. Let's be direct about them.
Simple Queries Become Complex
In a regular database: "give me all orders over $100 that are in 'pending' status" is one SQL query.
In an event-sourced system: you need a projection that has already computed this view. If you don't have one, you either build a new projection (wait for backfill) or scan the event log (slow and expensive). The simplicity of ad-hoc queries over relational data is gone.
You Can't Delete Data
"Delete all data for this user" — a routine GDPR request in a traditional system — becomes an architectural crisis in an event-sourced system. The user's data is baked into the event log, which is immutable.
Workarounds exist: crypto-shredding (encrypt user data with a per-user key, then delete the key), separate PII storage referenced by ID in events, explicit erasure events that projections interpret as "forget this data." None of these are simple and all require planning from day one.
If your system handles personal data covered by GDPR or similar regulations, design your erasure strategy before you write your first event. Retrofitting it later is extremely painful. The most common mistake is storing PII directly in events — encrypt it or reference it by ID from a separately deletable store.
The Learning Curve Is Steeper Than It Looks
Most developers are comfortable with CRUD. Event sourcing requires thinking in terms of commands, events, aggregates, projections, and sagas. These are learnable concepts, but there's a real ramp-up period. A team adopting event sourcing for the first time will be slower for the first few months, not faster.
The Tooling Gap
Relational databases have 40 years of tooling: ORMs, query builders, migration tools, GUI clients, backup utilities, monitoring integrations. Event stores are newer. EventStoreDB, Axon Server, and Kafka-based approaches all work, but the ecosystem is thinner. You'll encounter rough edges.
When to Use It, When Not To
Event sourcing is a powerful tool for the right problem. It is not a default choice or an architectural upgrade. Here's a practical guide.
Use Event Sourcing When:
- Audit and compliance are first-class requirements. Financial systems, healthcare records, e-commerce with dispute resolution, anywhere you need a complete, tamper-evident history. This is the strongest case.
- You need to ask questions about the past that you haven't thought of yet. If your domain is complex and you expect analytics or reporting requirements to evolve over time, the ability to replay history against new projection code is genuinely valuable.
- You're building an event-driven integration architecture. If multiple services need to react to things that happen in your domain, publishing events from an event log is cleaner than polling a database or calling APIs.
- Your domain has complex, reversible workflows. Order management, insurance claims, loan applications — domains with multi-step workflows, approvals, and reversals map naturally to events.
- Debugging production is a recurring problem. If you regularly find yourself wishing you could "replay what happened," event sourcing solves that problem structurally.
Don't Use Event Sourcing When:
- Your team is already stretched. Event sourcing adds complexity. If the team is small and moving fast, the overhead will slow you down more than the benefits will help.
- You're building a simple CRUD application. A content management system, a settings page, a user profile — these don't benefit from an immutable log of events. They'll just become harder to build and maintain.
- The domain model is unclear. Event sourcing requires you to define your events well. If you're still figuring out the domain, you'll end up with a bad event schema that's expensive to change. Get clarity first.
- You need ad-hoc queries as a core feature. If analysts or product managers need to run arbitrary queries against your data all the time, a relational database with good indexing will serve you better.
You don't have to go all-in on event sourcing. Many systems benefit from a hybrid: store mutable state in a relational database AND publish domain events to an event bus for integration. You get event-driven integration without the full complexity of event sourcing. This is often the right starting point.
Practical Implementation Notes
Aggregate Boundaries
In event sourcing, an aggregate is the unit of consistency. All events for one aggregate (e.g., one order, one account) are stored together and processed in sequence. Events across different aggregates can be processed in parallel.
Choosing aggregate boundaries is one of the most consequential design decisions. Too large: your aggregates become god objects that hold too much state and contend on writes. Too small: you find yourself needing transactions across aggregates, which is hard to do correctly in an event-sourced system.
A useful rule: an aggregate should enforce all the business invariants that need to be true at the same time. If "an order total must equal the sum of its line items" is a rule, then order and its line items should be in the same aggregate. If "an order total must be less than the customer's credit limit" requires reading the customer's state, you may need to handle that differently — possibly accepting temporary inconsistency and compensating.
Choosing an Event Store
You have several options for where to store the event log:
- PostgreSQL with an events table: simple, uses your existing infrastructure, gets you started quickly. Append-only inserts are fast. Querying by aggregate ID is easy. Works well for moderate scale.
- EventStoreDB: purpose-built for event sourcing, native support for streams per aggregate, built-in subscription model. More operationally complex to run than Postgres.
- Kafka: excellent for event distribution and multi-consumer scenarios, not ideal as the primary event store because offset-based addressing doesn't map cleanly to aggregate streams. Common to use Kafka for event distribution and a database as the source of truth.
The UNIQUE (stream_id, version) constraint is critical. It gives you optimistic concurrency control: if two processes try to append version 5 to the same stream simultaneously, one will get a unique constraint violation. The losing process re-reads the current state and retries. This prevents lost updates without requiring locking.
State is a snapshot; events are the truth. If you store what happened rather than just what is, you gain history, debuggability, and flexibility — but you trade simplicity and must treat your event schema as a permanent contract.
Treating event sourcing as an architectural upgrade you can apply to any system to make it better. It's not. Applied to a CRUD application without a genuine need for history or event-driven integration, event sourcing adds weeks of complexity and an ongoing operational burden with no offsetting benefit. Match the tool to the problem.
Three Questions for Your Next Design Review
- Do we have genuine requirements for audit trails, temporal queries, or event-driven integration — or are we just attracted to the pattern because it sounds sophisticated?
- Have we designed our event schemas as permanent contracts, and do we have a strategy for schema evolution before the first event is written?
- How will we handle a GDPR deletion request for a user whose data is embedded in 50,000 historical events?