Quorum: One Overlap Rule, Two AWS Use Cases
You run a datastore of 3 nodes — 3 copies of the same data. An application just wrote a new value, and a moment later another request reads it back. But right then, one node is lagging behind on replication, or an entire AZ is flickering. How can you be sure that read returns the latest write rather than stale data?
And in a problem that sounds completely unrelated: when the node coordinating the cluster dies, how does the cluster elect exactly one replacement — and not two nodes both claiming to be in charge, overwriting each other?
Two very different questions, but one idea solves both: quorum. At its core, a quorum just forces subsets to overlap — so that any two operations always “meet” on at least one shared member.
1. How Quorum Works
Quorum is originally a word from meetings: the minimum number of members who must be present for a body to make a valid decision. A 9-person board might set its quorum at 5 — with fewer than 5 present, any vote is void; but once 5 are there, their decision stands for the whole board, even if the 4 absentees would later object. The key idea: a large-enough agreeing group can represent the whole body — you don’t need everyone.
So what is a quorum for? Quorum was created to solve the problem of making decisions by consensus.
You’ve almost certainly run into a very common form of quorum in real life before: if a decision is approved by more than 50% of the members of a group, that decision passes.
That is exactly the majority quorum — a set larger than half, i.e. ⌊N/2⌋ + 1 (for example 2 of 3, or 3 of 5). Since every majority quorum is larger than N/2, any two majority quorums satisfy |Q₁| + |Q₂| > N — in other words, any two majority groups always overlap. This is the tightest form of quorum and the foundation of most consensus algorithms.
The cost: the larger the quorum, the stronger the guarantee, but the more live members you need and the higher the latency of each operation, since you wait for more members to respond.
So far quorum is just an abstract inequality. The interesting part is that this very inequality — |Q₁| + |Q₂| > N — sits behind two seemingly unrelated AWS problems below. Each problem is simply a way of assigning Q₁ and Q₂ to concrete roles.
2. Use Case 1 — Read/Write Consensus (DynamoDB & Aurora)
The first problem: how does a distributed database guarantee a non-functional requirement — that a user can read the latest data right after writing it?
You’ll run into this many times in practice: you’ve just added a product to an online shopping cart, you open the cart to check and see nothing, and after reloading the cart, the product shows up.
It occurs to you that if every read and write had to be agreed upon across all nodes of the distributed database, users would always be guaranteed the latest data.
But that decision drives operation latency way up: instead of reading or writing just one node, every operation now has to wait for all nodes to respond.
So you adjust it slightly: if only writes must be agreed upon across all nodes, then reading any single node always returns the latest data.
But this change only solves the latency of read traffic.
This is where quorum shows its power.
We assign the two quorums from Section 1 to concrete roles. The first quorum is the write set (W) — the number of copies that must confirm the write before it’s reported successful. The second is the read set (R) — the number of copies you query on a read. The overlap constraint |Q₁| + |Q₂| > N becomes the classic formula:
W + R > N
Because the write set and read set are forced to share at least one copy, a read always touches a copy holding the latest write. Add one more condition — W > N/2 — so that two writes can never complete on two disjoint sets, preventing conflicting overwrites.
DynamoDB
Amazon DynamoDB splits each piece of data into 3 replicas placed across 3 different AZs, with one replica acting as the leader (the only copy that takes a write first, then propagates it to the others). A write is acknowledged as soon as the log record is durably stored on 2 of the 3 replicas, i.e. the leader plus one follower. That’s W = 2 out of N = 3.
Reads come in two modes, and this is the distinction you need to be clear on:
- Strong read (strongly consistent): always served from the leader, so it always sees the latest write. If the leader is unreachable, this read fails.
- Eventual read (eventually consistent): served from any replica, faster and cheaper, but may return data that is a few milliseconds stale if it hits a replica that hasn’t caught up.
A common exam point: the 2/3 figure is an internal detail of DynamoDB; you can’t tune it. Unlike Cassandra or the original Dynamo (which let you set N, W, R yourself), DynamoDB gives you exactly one switch — choosing strong vs eventual read on each API call.
Aurora
Amazon Aurora pushes the same idea further at the storage layer. Each piece of data has 6 copies spread across 3 AZs, 2 copies per AZ. Aurora sets a write quorum Vw = 4 and a read quorum Vr = 3, satisfying both conditions above: 4 + 3 > 6 and 4 > 6/2. A write completes as soon as 4 of 6 copies confirm. As a result Aurora survives the loss of an entire AZ (2 copies) and still accepts writes, and survives an AZ plus one extra copy without losing data. Note that Aurora ships only redo log records to the 6 copies rather than full data pages.
There’s a common misconception worth correcting here: in steady state, Aurora doesn’t really do quorum reads. The Vr = 3 figure exists so that the inequality W + R > N always holds — a design constraint that keeps the system safe after failures. But on the hot read path, Aurora doesn’t gather 3 of 6 copies: the writer knows exactly which copy holds the latest version of each block, so it reads straight from that single copy. The 3/6 read quorum only really kicks in during recovery — an instance restarting, a replica being promoted to a new writer, or a lost copy being rebuilt.
Two services, two ways of realizing the same overlap math: DynamoDB uses a single leader as the consistent read point; Aurora uses a single writer + log and reads straight from the copy it already knows is current.
3. Use Case 2 — Leader Election
The second problem sounds entirely different: in a cluster, we need exactly one node to be the leader — the coordinating node, the only one allowed to write or make decisions. When the leader dies, the cluster must elect a replacement, but it must never elect two.
Surprisingly, the same principle from Section 1 solves it. This time both quorums are one and the same — the set of votes. A candidate becomes leader only by collecting a majority of votes. Setting Q₁ = Q₂ = majority into the overlap constraint gives M + M > N, i.e. each leader needs M > N/2 votes. In a 3-node cluster, a candidate needs 2 votes: its own plus one more.
Why does this prevent two leaders? Because any two majority sets are forced to share at least one voter — exactly the overlap property from Section 1. And each voter casts only one vote per term. A member can’t vote for both candidate A and candidate B in the same term; yet two majorities must share at least one voter — so at most one candidate can reach a majority. At most one leader per term.
There’s one practical gap left: the handover. The old leader may not yet know it’s been replaced (due to network delay) while the new leader has just come up — in that instant, both think they’re in charge. This is exactly split-brain. The fix is a lease: leadership is granted for a limited time, and the new leader must wait for the old leader’s lease to expire before it starts serving writes. That way there’s never a window in which two leaders both accept writes.
The interesting part: this isn’t distant theory. DynamoDB from Section 2 elects a leader for each group of 3 replicas in exactly this way — the leader holds its role via a lease, refreshed by heartbeats; when the heartbeats stop, the remaining replicas elect a new leader. One quorum principle running both use cases.
Conclusion
Back to the two opening questions. The read returns the latest write because the read set and the write set are forced to overlap. And the cluster never has two bosses because any two majority groups are forced to share at least one vote. One overlap principle, two problems.
Key takeaways:
- Quorum is a neutral principle, not a read/write or voting problem. The core is simply: two quorums with |Q₁| + |Q₂| > N always overlap. Read/write and leader election are just two applications.
- A majority quorum (> N/2) guarantees that any two majority groups always overlap — the foundation for both data consistency and electing exactly one leader.
- Read/write: assign the roles as W + R > N. DynamoDB uses 3 replicas, a write quorum of 2/3 (internal, not tunable), strong reads from the leader, eventual reads from any replica.
- Aurora: write quorum 4/6, read quorum 3/6, surviving an AZ plus one copy — but a normal read goes straight to a single copy it already knows is current, not a quorum read.
- Leader election: assign the roles as M + M > N — majority votes, one vote per term, plus a lease to prevent split-brain.
In short, the real purpose of a quorum isn’t to keep the system “always alive,” but to ensure that when it’s alive, there is exactly one truth.