1. Failures
<aside>
💡 Fail-Stop failures: falure ⇒ stop computer
</aside>
Replication can’t solve problems like:
- Logic bugs
 
- Configuration errors
 
- Malicious errors
 
And may solve problems like:
2. Challenge
- Has primary actually failed?
- Can’t tell diff between network partition and computer fail
 
- May cause split-brain system
 
 
- How do we keep primary / backup in sync
- Apply all changes in the right order
 
- Deal with non-determinism
 
 
- Fail over
 
3. Two Approaches
- State transfer ⇒ Send snapshots to the backup
 
- Replicated State Machine ⇒ Only send operations to the backup
 
<aside>
💡 Level of operations to replicate
- Application-level
 
- Machine level ⇒ transparent!
- Then application doesn’t need to be modified at all
 
- Use virtual machines
</aside>