Raft FAQ | Notion

Q: What do people use Raft for?

A: The most frequent use for Raft (and Paxos) is to build fault-tolerant "configuration services" whose job is to keep track of how responsibilities are currently assigned to servers in a large deployment. This job is particularly sensitive for deployments with replication; Raft-based configuration services are often used to select primaries in a way that avoids split brain. The VMware FT test-and-set server is a simple example of a configuration service. Chubby, ZooKeeper, and etcd are more powerful fault-tolerant configuration services based on Raft or Paxos; they are widely used.

Some databases, such as Spanner, CockroachDB, and Lab 3, use Raft or Paxos to replicate the data. (In contrast, GFS, VMware FT, and Chain Replication use simpler primary-backup for the data.) Some databases use Raft or Paxos in two different ways: for the configuration service that assigns responsibilities to servers (for every shard, who is currently primary and who are backups), and separately to handle the data within each shard.

Q: Does Raft sacrifice anything for simplicity?

A: Raft gives up some performance in return for clarity; for example:

Every operation must be written to disk for persistence; performance probably requires batching many operations into each disk write.
There can only usefully be a single AppendEntries in flight from the leader to each follower: followers reject out-of-order AppendEntries, and the sender's nextIndex[] mechanism requires one-at-a-time. A provision for pipelining many AppendEntries would be better.
The snapshotting design is only practical for relatively small states, since it writes the entire state to disk. If the state is big (e.g. if it's a big database), you'd want a way to write just parts of the state that have changed recently.
Similarly, bringing recovering replicas up to date by sending them a complete snapshot will be slow, needlessly so if the replica already has a snapshot that's only somewhat out of date.
Servers may not be able to take much advantage of multi-core because operations must be executed one at a time (in log order).

These could be fixed by modifying Raft, but the result might have less value as a tutorial.

Q: Is Raft used in real-world software, or do companies generally roll their own flavor of Paxos (or use a different consensus protocol)?

A: There are several real-world users of Raft: Docker (https://docs.docker.com/engine/swarm/raft/), etcd (https://etcd.io), and MongoDB. Other systems said to be using Raft include CockroachDB, RethinkDB, and TiKV. Maybe you can find more starting at http://raft.github.io/

On the other hand, many real-world state-machine replication systems (Google's Chubby, ZooKeeper's ZAB) are derived from the older Multi-Paxos and Viewstamped Replication protocols.

Q: What is Paxos? In what sense is Raft simpler?

A: There is a protocol called Paxos that allows a set of servers to agree on a single value. While Paxos requires some thought to understand, it is far simpler than Raft. Here's an easy-to-read paper about Paxos:

http://css.csail.mit.edu/6.824/2014/papers/paxos-simple.pdf

However, Paxos solves a smaller problem than Raft. To build a real-world replicated service, the replicas need to agree on an indefinite sequence of values (the client commands), and they need ways to efficiently recover when servers crash and restart or miss messages. People have built such systems with Paxos as the starting point; look up Google's Chubby and Paxos Made Live papers, and ZooKeeper/ZAB. There is also a protocol called Viewstamped Replication; it's a good design, and similar to Raft, but the paper about it is hard to understand.

These real-world protocols are complex, and (before Raft) there was not a good introductory paper describing how they work. The Raft paper, in contrast, is relatively easy to read and fairly detailed. That's a big contribution.

Whether the Raft protocol is inherently easier to understand than something else is not clear. The issue is clouded by a lack of good descriptions of other real-world protocols. In addition, Raft sacrifices performance for clarity in a number of ways; that's fine for a tutorial but not always desirable in a real-world protocol.