Virtual Synchrony
Notes from the classic 1987 paper from Birman and Joseph on Virtual Synchrony, a computation model for distributed systems.
What is Virtual Synchrony?
We are essentially talking about an approach to building highly available systems which is designed to be easy, flexible and more likely to give a correct solution than other approaches.
The kinds of systems we're talking about making with this approach are your typical mission critical applications and so we have other considerations like scheduling of work, replicating data, co-ordinating actions in physically seperate locations, load balancing and other things like dynamic reconfiguration after a component bounces or there is an outage.
The things we need to consider with these kinds of applications includes individual processes, or sites/locations crashing, network partitions, message loss etc. In the 1987 paper, Birman and Joseph also make some assumptions around failing processes not sending incorrect messages and tolerating partition failure.
The approach Birman and Joseph propose is to break down an application into orthogonal components (services lets say) and then provide a toolkit which helps to address the common issues which occur in distributed systems.
Each tool is accessible from your application through an API. Some tools will require the management and persistence of external state. The tools allow you to connect programs which can now be written as though they were designed for a non-distributed environment.
Toolset
The toolset includes:
- Initiating async actions
- Non-blocking updating of replicated data
- Distributed locks
The key to implementing these tools is a communications mechanism (or protocol) which allows us to create processes or groups of processes which are virtually synchronous. All processes observe the same events, in the same order which amongst other things allows us to make assumptions about the actions other processes will take. The tools can be used as if they were synchronous.
Local Decisions
The advantage of focusing on making local decisions is that your program/application doesn't need to interact with any other services/applications before making a decision, therefore there is less delay.
Common Problems
We want async communication among processes, so without shared memory, our most general approach is messaging. This introduces it's own problems:
- Message transmission times can vary
- Messages from a single event can arrive at different processes at different times and in different order
- Failure detection is based on timeouts, so hard to differentiate between system overload and a general transient system failure
A lot of distributed systems are based on RPC with timeouts. Transactions only solve part of the problem.
Problems Solved by Tools
Process Groups and Group Communication
We want to be able to structure a system into groups of processes which may not be identical. Processes should be able to belong to multiple groups and we need a mechanism to join and leave groups as well as being able to communicate with members of a group, even whilst membership is changing.
Request Strategy
There needs to be a strategy for processing requests by a process group. If possible, the response should be generated and send using only local information (see Local Decisions above).
Concurrency
Allow for multiple processes to be taking action at the same time as well as continuing execution after sending an async message.
Synchronization
Concurrently executing processes can need locks, which must also deal with the failure of a process which is holding the lock (using deadlock detection for example).
Replicated Data
Replication of data amongst co-operating processes.
Detecting and Reacting to Failure
Members of a group may need to be informed of member failured. We don't want messages from a failed process after we've figured out the process has failed.
Dynamic Reconfiguration
System configuration must be adjustable on outage or system load.
Stable Storage
To recover state after a failure, we need to be able to periodically checkpoint or use a recovery log/journal.
Recovery
Restart gracefully using stable state on complete system failure. Mechanisms for reincorporating a process which is coming back up after a partial failure - the rest of the system has been continuing to operate whilst it has been down, so now we also need to do things like rehydrate state.
Transactions
We need to be able to write to shared disk in a transactional manner.
Protections
The system must protect itself against actions by erroneous clients.
Consistency
The system should still be correct globally as long as we use locally correct components or algorithms.
Atomic Multicast
We could base such a system and tools on atomic multicast - this means that all destinations either recieve the message or all destinations don't recieve the message. We can't have 50% of the members recieving the message and the other 50% not.
We also need delivery ordering, so messages are seen in the same order by recipients.
This allows processes to keep a consistent view of one another - events like site failures, recoveries etc occur in the same order everywhere. The requirement of ordered delivery ends up being quite costly.
We can apply weaker delivery ordering semantics in situations where an application doesn't require ordering.
ABCAST Primitive
Messages are delivered atomically and in the same order everywhere.
CBCast Primitive
Ordering is preserved on a per-client or per-session basis.
Causally linked messages have guaranteed ordering. If an output message is the result of a particular input message (and associated computation) then recipients can not see the resulting output message before they see the input message which generated the output message.
GBCast Primitive
Use ABCAST to get a distributed lock for mutual exclusion on a resource, then use CBCAST to interact with that resource. Used for group addressing by the system itself.
Links and Resources
https://www.cs.cornell.edu/home/rvr/sys/p123-birman.pdf - "Exploiting Virtual Synchrony in Distributed Systems", Kenneth P. Birman and Thomas A. Joseph, Department of Computer Science, Cornell University
https://people.cs.rutgers.edu/~pxk/417/notes/virtual_synchrony.html - Paul Krzyzanowski, Department of Computer Science, Rutgers
https://www.cs.cornell.edu/home/rvr/papers/StrongWeakVS.pdf - "Strong and Weak Virtual Synchrony in Horus", Friedman & Renesse, Department of Computer Science, Cornell University