Log Structured Protocols
Notes from the 2021 paper from Meta/Facebook Research on log structured protocol usage in their Delos system (a distributed database used as part of Facebook's control plane layer). Also covers replicated state machines.
What is Delos?
A distributed database at Facebook which drives their control plane. It was built around the idea that databases are structured as a combination of consensus protocols and replicated state machine.
Drivers
Some of the major drivers were to enable modularity, reuse of code and minimization of dependencies. It was also designed to allow for a diversity in storage options and simplify the reasoning around the HA, fault tolerance and consistency dimensions of different existing databases. It uses a virtual concensus layer which allows for pluggable concensus protocols.
Architecture
It uses an hourglass shaped architecture, which seperates consensus (via a virtual concensus layer), storage and the replicated state machine. In the below diagram, the Delos Platform in pink is essentially made up of stackable replicated state machines.
Source: Delos Presentation
Log Structured Protocols
A log structured protocol is a type of state machine replication using a shared log. The protocol is used to replicate state across different nodes, using an API.
Engines
The state machines are stackable and are often referred to as engines. The state machine closest to the log is referred to as the BaseEngine. The base engine can read from the log, calling an apply()
method of the next highest up state machine.
When something needs to be written to the log, the BaseEngine takes care of this, responding to a call on it's propose()
method.
Source: Delos Presentation
Different state machines in the engine wrap/encapsulate parameters on calls to neighbouring state machines apply()
or propose()
methods with their own headers (piggybacking headers).
At the very top, we have the concept of a wrapper (an inbound request gateway) and an applicator (which provides a response to the apply()
call of the top most engine).
Source: Facebook Research
Additional Engine Functionality
- Drop or filter apply() calls
- Be used to create batching by utilising local state
- Register lifecycle events like
postApply()
with lower layer engines - Use local storage
- Have soft local state (on heap)
Engine Conventions
- Cross layer inspection is discouraged
- Isolated state
- Shouldn't modify state belonging to other engines
New log entries can be generated by an engine, in which case the header is modified to indicate a change from apply()
to propose()
which prevents the upcall of apply()
from continuing upwards towards the Wrapper and Applicator.
Observer Engine
This is an engine placed between layers to allow for things like observability.
Source: Facebook Research
Replicated State Machines
The layered engines in Delos are replicated state machines. These are components which have internal state which can be modified by commands which trigger deterministic behaviour which can mutate internal state and potentially produce output events.
The most important aspect of replicated state machines is that of deterministic behaviour - this means that when we replicate them by replicating their inbound messages, their internal state and output events will be the same. It is this determinism which allows us to better reason about what functionality/state mutation is occuring in multiple (and possibly non-homogenous) components.
Links and Resources
https://research.facebook.com/publications/log-structured-protocols-in-delos/ - "Log-structured Protocols in Delos", ACM Symposium on Operating Systems Principles (SOSP), Facebook Research
https://www.alibabacloud.com/blog/an-introduction-to-the-system-architecture-abstraction-of-the-replicated-state-machine_599279 - "An Introduction to the System Architecture Abstraction of the Replicated State Machine", Alibaba Cloud Blog
https://systems-rg.github.io/slides/2022-06-10-delos-engines.pdf - "Log Structured Protocols in Delos", Prof Abhilash Jindal, Ramita Sardana, IIT Delhi
https://aeron.io/docs/distributed-systems-basics/replicated-state-machines/ - "Replicated State Machines", Aeron.io