Chapter 36  ·  Part X — The Human System

Conway's Law and Organizational Design

"You cannot design a system better than the communication structure of the organization that builds it."

In This Chapter

Key Learnings — Quick Glance

  1. Conway's Law is not optional. Your system will reflect your org structure, whether you plan it or not. The question is whether you design both together.
  2. The Inverse Conway Maneuver says: first decide what system architecture you want, then structure your teams to match it. Don't do it backwards.
  3. Team cognitive load is a real constraint. A team can only own so much complexity before quality and speed both drop. Size services to fit a team, not the other way around.
  4. The four team types — stream-aligned, platform, enabling, complicated-subsystem — are not just labels. Each has a distinct interaction mode and a distinct failure mode.
  5. Handoff points between teams are where latency, blame, and reliability problems concentrate. Design your team boundaries to minimize these handoffs on the critical path.
  6. Ownership gaps are more dangerous than bugs. A bug has an owner. A gap has nobody.
  7. Distributed monoliths happen when you split teams without splitting the system. This is the worst of both worlds: the deployment pain of microservices with the coupling of a monolith.

Where This Idea Comes From

In 1967, a computer scientist named Melvin Conway submitted a paper to the Harvard Business Review. They rejected it. He published it elsewhere the following year, and it contained one observation that has aged better than almost anything else written about software in that era.

Here is what he wrote, in plain terms: any organization that designs a system will produce a design whose structure copies the communication structure of that organization.

That's it. That one sentence. But unpacking it properly takes the rest of this chapter, because most engineers understand it in a shallow way that misses most of the practical value.

The shallow version is: "our API is messy because our teams don't talk." That's true, but it's a post-mortem observation. The powerful version is prospective: if you know this law holds, you can use it as a design tool. You can look at a proposed org structure and predict the architecture that will emerge from it. And you can flip the process around — decide the architecture you want first, and then deliberately structure your teams to produce it.

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of those organizations."
— Melvin Conway, 1968

Why the Law Holds — The Mechanism

Conway's Law is not a coincidence or a quirk. It follows from something very basic about how software gets built.

When two people work on a system together, they can coordinate informally. They sit near each other, they talk constantly, they share assumptions without writing them down. The interface between their pieces of work can be rough, implicit, and changing because fixing a misunderstanding costs almost nothing.

When two teams work on a system together, the cost of coordination goes up dramatically. The teams have different meeting schedules, different priorities, different managers, different roadmaps. The interface between their pieces of work now needs to be explicit, stable, and negotiated — because changing it requires coordinating across an organizational boundary, and that is expensive.

So teams naturally and rationally minimize the number of interfaces they have with other teams. They prefer to own things end-to-end. They build walls around what they control. These walls become the API boundaries, the service boundaries, the data ownership boundaries in your system. The org chart maps directly onto the architecture.

The Mechanism in One Line

Teams minimize coordination cost. Coordination cost follows team boundaries. Therefore, system interfaces follow team boundaries.

A Concrete Example

Imagine a company building a payments platform. They have three teams: one for the user-facing checkout flow, one for the internal payment processing engine, and one for fraud detection. No one planned the architecture explicitly — they just hired fast and divided the work.

A year later, they notice the system has exactly three major services with three distinct APIs between them. The fraud detection service is separate from the payment processing service, even though the coupling between them is so tight that every payment processor change requires a fraud team review. This tight coupling manifests as deployment coordination overhead, not clean service independence.

Nobody designed this. It emerged from the communication structure. The team that owned fraud detection drew a fence around their domain to protect their deployment independence. The payment processing team did the same. The fence became the API.

Real-World Pattern

Amazon's famous "two-pizza teams" rule and their resulting service-oriented architecture are the same mechanism from the other direction: Bezos mandated that teams communicate only through service APIs, which forced the architecture to decompose along team lines. The mandate was organizational, but the output was architectural.

The Subtler Version Most Engineers Miss

Most engineers, when they hear Conway's Law, think about the number of services. "We have 12 teams so we have 12 services." That's the obvious version. The subtler version is about the shape of the interfaces, not just the number of components.

Consider a team that owns both the database schema and the service that uses it. They will design that schema primarily for their own needs. They will optimize for their access patterns. They will change it freely when they need to because the blast radius is internal.

Now split that work across two teams: one owns the database, one owns the service. Suddenly the schema has to be versioned. Changes go through an API contract negotiation. The schema becomes more conservative and less optimized for any single access pattern because it has to serve a stable external contract. The interface has changed in character, not just in location.

This is why the same underlying technology can look completely different in two companies. The technology did not change. The team structure changed, and the interface character changed with it.

The Reverse Also Holds

Conway's Law runs in both directions. Not only does org structure produce system structure — system structure, once established, starts to constrain org structure.

If your system has a hard boundary between the user database and the transaction database, you will eventually end up with a team that owns each. The system's architecture becomes a map that managers and HR use when building teams. New engineers get assigned to a service, which means they get assigned to a team, which means they develop loyalty to that service, which makes the boundary more rigid over time.

This feedback loop means that once a system architecture is established, it becomes hard to restructure — not because of technical inertia alone, but because it has grown an organizational immune system.

Warning: The Feedback Loop

Architectural decisions made early in a company's life often persist for a decade not because they're technically superior, but because they've grown matching organizational structures that resist change. This is one reason why large-scale re-architectures require simultaneous org changes — one without the other usually fails.

The Inverse Conway Maneuver

Once you understand that org structure and system architecture are coupled, you can use this deliberately. The Inverse Conway Maneuver says: design the organization you need to produce the system architecture you want, rather than accepting whatever architecture emerges from your current org.

This idea was named and formalized by Jonny LeRoy and Matt Simons, and later popularized by the Team Topologies book by Matthew Skelton and Manuel Pais. The logic is simple:

The order matters enormously. Most companies start with people (who's available, who reports to whom) and let the architecture emerge from that. The Inverse Conway Maneuver says to start with architecture and work backwards to org structure.

The Practical Implication

Before proposing a service split, ask: can we staff an independent team around this service? A service with no clear team owner will accumulate shared ownership, which means unclear responsibility, which means nobody moves fast and nobody feels accountable when it breaks.

The Maneuver in Practice

Here is a concrete scenario. Suppose you are rebuilding a legacy monolith into services. You have decided the target architecture should have three independent domains: user identity, order management, and inventory. You want clean domain boundaries that can be deployed and scaled independently.

The wrong approach: split the codebase into three repos and assign existing engineers randomly to each. The engineers still have the same communication patterns they had before. They will informally share database schemas. They will call each other's code directly before the APIs are stabilized. Three months later, you have three repos that are just as coupled as the monolith.

The right approach: form three teams with explicit ownership, independent roadmaps, and a clear API contract process. The teams now have an organizational incentive to keep their boundaries clean, because crossing them is expensive. The API discipline emerges from the structure.

── WRONG APPROACH ────────────────────────────────────────────────── Monolith Three Repos (but same team, same communication) ───────── ────────────────────────────────────────── │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ Users │ │ users/ │◄──►│ orders/ │◄──►│ inv/ │ │ Orders │ ──► │ │ │ │ │ │ │ Inv. │ │shared DB│◄───┤shared DB├───►│shared DB│ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ Still coupled. Now with more YAML files. ── RIGHT APPROACH ────────────────────────────────────────────────── Three Teams with independent roadmaps + explicit API contracts ┌──────────────────┐ API ┌──────────────────┐ API ┌────────────────┐ │ Identity Team │ ──────► │ Orders Team │ ──────► │ Inv. Team │ │ │ │ │ │ │ │ owns: DB, svc │ │ owns: DB, svc │ │ owns: DB, svc │ │ deploys: own │ │ deploys: own │ │ deploys: own │ └──────────────────┘ └──────────────────┘ └────────────────┘ Org boundary = system boundary. Conway's Law working for you.

Team Cognitive Load: The Constraint You Are Actually Managing

Before we get into team types, there is a concept that underlies all of them: cognitive load. A team can only hold so much in their heads. There is a limit to how many services a team can own, how many technologies they can master, how many on-call incidents they can handle, and how many codebase contexts they can switch between before quality degrades.

When a team is overloaded, you see predictable symptoms: slow feature delivery, high incident rates, poor test coverage, deferred tech debt, and engineers who are always "context switching" instead of going deep on anything. These look like execution problems. They are actually design problems — the team's ownership area is too large or too complex for the team's size.

The right unit of ownership is: a team should be able to build, deploy, operate, and iterate on their services without needing deep expertise in more than 2-3 distinct technology domains, and without needing to coordinate with more than 3-4 other teams regularly.

If a team regularly needs to coordinate with 8 other teams to ship a feature, the problem is not the team. The problem is the service boundaries.

Types of Cognitive Load

Intrinsic load is the complexity that is inherent to the problem domain — understanding business rules, domain logic, data models. You cannot eliminate this. This is the work.

Extraneous load is the unnecessary cognitive burden from accidental complexity — fighting with build systems, understanding poorly documented legacy code, managing inconsistent infrastructure, dealing with unclear ownership of shared libraries. This is waste. Your job as a designer is to reduce it.

Germane load is the effort of learning and building deep expertise in your domain. This is investment. You want teams to have room for this.

When teams are overwhelmed with extraneous load — too many services, too many technologies, too much coordination — they have no capacity for germane load. They stop getting better at their domain. They just survive each week. This is the burnout spiral that looks like a headcount problem but is actually an architecture problem.

The Design Question

For every service boundary you draw, ask: which team will own this? Can they own it with their current capacity? If the answer is "we'll figure out ownership later" or "this will be shared ownership," you are creating a future reliability problem right now.

The Four Team Types

Team Topologies describes four fundamental team types. These are not org chart boxes. They are distinct modes of operation with distinct purposes, distinct interaction styles, and distinct failure modes when misapplied.

1. Stream-Aligned Teams

A stream-aligned team is organized around a continuous flow of business value in a particular domain — a product, a user journey, a service. This is your most common team type. Most engineers work in a stream-aligned team most of their career.

A stream-aligned team owns their services end-to-end: they design them, build them, deploy them, and operate them on-call. They are the primary value-delivery unit. They move fast because they have minimal external dependencies for their core work.

The key word is "minimal." A stream-aligned team that needs to wait for three other teams to ship one feature is not actually stream-aligned — it is a coordination unit masquerading as an autonomous team.

Healthy signs: The team can go from idea to production deployment without needing another team's approval on the critical path. They own their on-call rotation. They can define their roadmap quarterly with minimal external blockers.

Failure mode: A stream-aligned team that has grown too large, owns too many domains, or has too many hard dependencies on other teams. This team starts behaving like a mini-bureaucracy — all planning, no shipping.

2. Platform Teams

A platform team builds and operates the internal infrastructure and tooling that stream-aligned teams use. Their job is to reduce the cognitive load on stream-aligned teams by providing self-service capabilities.

The key word here is self-service. A platform team that requires a ticket and a 3-day turnaround for every infrastructure request is not a platform — it is a bottleneck with documentation. A true platform team builds things that stream-aligned teams can use without engaging the platform team at all.

Think of it like a cloud provider for internal infrastructure. AWS does not require you to call them every time you provision an S3 bucket. They built an API you can use yourself. Internal platform teams should hold themselves to the same standard.

The Most Common Platform Team Failure

Platform teams that measure success by the number of internal users rather than by the reduction in cognitive load on those users. A platform that 50 teams use but that requires 10 hours of integration work per team is not a good platform. Measure: "How long does it take a new stream-aligned team to be productive using this platform?" That number should be going down every quarter.

Failure mode: A platform team that becomes the gatekeeper for infrastructure instead of the enabler of it. Every new service needs the platform team to provision a database, register the service, configure logging. Stream-aligned teams learn to work around the platform team, which defeats the entire purpose.

3. Enabling Teams

An enabling team is a temporary or semi-permanent team of specialists whose job is to help stream-aligned teams develop capabilities they currently lack — and then step back once those capabilities are built.

Think of it like embedded coaching, not permanent consultants. An enabling team focused on observability might spend one quarter with each stream-aligned team, helping them instrument their services properly, training them on the tools, leaving behind runbooks and patterns. After that quarter, the stream-aligned team owns their observability. The enabling team moves to the next team.

The critical property: an enabling team should be making itself less needed over time. If a stream-aligned team is permanently dependent on the enabling team to do their work, the enabling team has failed at its job.

Good candidates for enabling teams: Security, observability, performance engineering, developer productivity, architecture patterns, data engineering foundations.

Failure mode: An enabling team that becomes the only people in the company who understand their domain. They feel valuable because everyone needs them. But they are actually a single point of failure and a cognitive load problem — teams can't ship without them.

4. Complicated-Subsystem Teams

A complicated-subsystem team owns a component that requires very deep specialist knowledge to develop and maintain — the kind of knowledge that takes years to build and cannot be spread across a stream-aligned team that needs to ship features at the same time.

Examples: a custom machine learning inference engine, a high-performance query optimizer, a custom cryptography implementation, a real-time graph processing engine.

The key property: this team's component should be usable by stream-aligned teams as a black box with a clean API. Stream-aligned teams should not need to understand the internals to use it. If they do, the API is wrong.

Common examples in large companies: Search ranking teams, recommendation engine teams, billing and payments core teams, identity and cryptography teams.

Failure mode: Using the complicated-subsystem label as an excuse to avoid accountability to consumers. "You wouldn't understand the internals" is not an acceptable answer to "why does this take 200ms?" Stream-aligned teams are your customers. Their experience matters.

── TEAM TOPOLOGY MAP ──────────────────────────────────────────────────── Enabling Team (Security, Observability) │ ┌──────────┼──────────┐ ▼ ▼ ▼ Stream-Aligned Stream-Aligned Stream-Aligned ◄── value delivery (Checkout) (Search) (Orders) │ │ │ └────────┬─────────┘ │ ▼ ▼ Platform Team Complicated Subsystem (Infra, CI/CD, Observ.) (ML Ranking, Payments Core) ── INTERACTION MODES ──────────────────────────────────────────────────── Collaboration: Two teams working closely together (temporary, with a goal) X-as-a-Service: One team provides a clean API; other team self-serves Facilitating: Enabling team helps stream-aligned team build capability

The Distributed Monolith: The Worst of Both Worlds

There is a failure mode so common it has its own name: the distributed monolith. This is what you get when you split a monolith into services without also splitting the teams, the database, or the deployment pipeline — or when you split the teams without splitting the coupling in the code.

A distributed monolith has all the deployment complexity of microservices (separate deployments, network calls, service discovery, distributed tracing) with all the coupling of a monolith (services that must be deployed together, shared database tables that multiple services write to, service A that calls service B synchronously for every single request).

You can spot a distributed monolith by asking these questions:

The distributed monolith usually happens because an organization decomposed their codebase into services (which is visible and feels like progress) without decomposing their data ownership and communication patterns (which is harder and less visible).

The Diagnostic Question

Ask your engineers: "If you wanted to change the database schema for your service, how many other teams would you need to coordinate with?" If the answer is more than one, you have distributed monolith properties. The number of schema dependencies is a better measure of coupling than the number of services.

How to Escape the Distributed Monolith

The escape is not technical — it is organizational and architectural together. You need to:

  1. Identify the domain boundaries using domain-driven design concepts. What are the bounded contexts? Where does one team's data model end and another's begin?
  2. Assign clear data ownership for every table. One team owns the write path. Others access data through APIs, not direct SQL.
  3. Break synchronous call chains. Services that must call each other synchronously in real-time are coupled in the worst way. Move to events where latency allows.
  4. Create independent deployment pipelines. If two services share a CI/CD pipeline or always need to be deployed together, they are not independent services.

None of these steps are fast. Escaping a distributed monolith typically takes as long as the original wrong architecture took to build. Which is the main reason to avoid building one in the first place.

Ownership Gaps: Where Reliability Quietly Dies

An ownership gap is a piece of code, infrastructure, or a data pipeline that either has no clear owner or has multiple owners who all assume someone else is responsible.

Ownership gaps look harmless until something breaks at 2am. Then everyone points at everyone else. The incident takes twice as long to resolve because three teams are involved and none of them are certain it's their job to fix it. The post-mortem correctly identifies the technical cause but never fixes the underlying issue: nobody owned that piece.

Ownership gaps are created in predictable ways:

The Ownership Audit

Once a year, every engineering organization should do a simple exercise: list every service, every database, every scheduled job, every infrastructure component. For each one, name a single team as owner. If you cannot name a team, that is an ownership gap. Assign it deliberately — don't let gaps accumulate silently. A gap found in an audit is a risk. A gap found during a P0 incident is a crisis.

The Shared Ownership Trap

Shared ownership is not the solution to an ownership gap. Shared ownership often IS the gap, just with documentation.

When something is owned by two teams, each team will rationally assume that the other team is handling maintenance, on-call response, and upgrades. This is not malice — it is rational behavior under ambiguous ownership. The result is that shared components get less attention than either team would give a singly-owned component.

If a component genuinely needs to serve multiple teams, the right answer is to put it in the platform team and have them own it properly — as a service, with an SLO, with a defined API, and with one team's name on the pager.

Practical Advice: Using Conway's Law as a Design Tool

Here is how to actively use these ideas when making architectural and organizational decisions.

When Starting a New System

Sketch the target architecture first. Draw the services, the data flows, the ownership boundaries. Then look at each service boundary and ask: "Which team will own this?" If you cannot name a team, the service boundary is not ready to be built. Either name a team or merge the service with another one that has a clear owner.

When Reviewing a Proposed Org Restructure

Map the proposed team structure onto the current system architecture. Every team boundary is a potential system interface. Ask: do we want a hard system boundary here? If the new team boundary cuts through the middle of a tightly coupled subsystem, you will either have high coordination overhead or you will eventually need to decouple the subsystem. Surfacing that cost early is valuable.

When Diagnosing Slow Feature Delivery

Before blaming process or headcount, draw the org chart and the system architecture side by side. Count how many team boundaries a typical feature request crosses. If a feature that should take two weeks requires coordination with four teams, the problem is the system boundary design, not the team's execution speed. No amount of process improvement will fix a coordination problem that is structural.

When Planning a Re-architecture

Plan the team changes in parallel with the technical changes. A re-architecture without an org change will produce the same architecture it replaced, for the reasons described above. The org change does not need to happen before the technical work — but it needs to be planned concurrently, with explicit milestones for when ownership transitions happen.

A Simple Test

For any proposed architectural change, ask: "If this were in production today, which team would be paged at 3am?" If the answer is unclear, the architecture is incomplete. Ownership is not a post-deployment detail — it is an architectural requirement.

The Limits of Conway's Law

Conway's Law is a powerful lens, but it is not the only lens. A few things it does not explain:

Technical excellence still matters. A well-structured org with poor engineering practices will build a poorly designed system anyway. Conway's Law tells you that structure enables or constrains good design — it does not guarantee it.

Company size changes the dynamics. In a 10-person startup, Conway's Law barely applies — everyone talks to everyone, team boundaries are fluid, the architecture is driven mostly by technical decisions. The law becomes more powerful as companies grow and communication pathways become more constrained.

Outsourced systems behave differently. When you buy a third-party service or use a major platform (AWS, Stripe, Twilio), you are consuming an architecture designed by another company's org structure. Their architecture becomes your constraint, regardless of your own org structure. This is one of the hidden costs of heavy third-party dependencies.

Forced interfaces can be good. Sometimes a rigid API boundary between teams is a feature, not a bug. It enforces information hiding, prevents tight coupling, and gives each team freedom to change internals without negotiating with others. The law describes what happens — it does not say that what happens is always bad.

The Key Principle

You cannot design a system better than the communication structure of the organization that builds it — but you can design the organization to produce the system you want. Use Conway's Law deliberately, not as an excuse.

The Most Common Mistake

Splitting a monolith into services without splitting the teams, data ownership, and deployment pipelines. This produces a distributed monolith — all the complexity of microservices with all the coupling of a monolith.

Three Questions for Your Next Design Review

  1. For every service boundary proposed — which team will own it, operate it, and be paged for it?
  2. How many team boundaries does a typical feature cross? If it's more than two, is the architecture causing that, or is it unavoidable?
  3. Is there any component, database, or job that has no single clear owner today? What is the plan to fix that?