Skip to main content

Virtual Synchrony

· 6 min read
Sanjeev Sarda
High Performance Developer

Notes from the classic 1987 paper from Birman and Joseph on Virtual Synchrony, a computation model for distributed systems.

Virtual Synchrony

What is Virtual Synchrony?

We are essentially talking about an approach to building highly available systems which is designed to be easy, flexible and more likely to give a correct solution than other approaches.

The kinds of systems we're talking about making with this approach are your typical mission critical applications and so we have other considerations like scheduling of work, replicating data, co-ordinating actions in physically seperate locations, load balancing and other things like dynamic reconfiguration after a component bounces or there is an outage.

The things we need to consider with these kinds of applications includes individual processes, or sites/locations crashing, network partitions, message loss etc. In the 1987 paper, Birman and Joseph also make some assumptions around failing processes not sending incorrect messages and tolerating partition failure.

The approach Birman and Joseph propose is to break down an application into orthogonal components (services lets say) and then provide a toolkit which helps to address the common issues which occur in distributed systems.

Each tool is accessible from your application through an API. Some tools will require the management and persistence of external state. The tools allow you to connect programs which can now be written as though they were designed for a non-distributed environment.

Toolset

The toolset includes:

  • Initiating async actions
  • Non-blocking updating of replicated data
  • Distributed locks

The key to implementing these tools is a communications mechanism (or protocol) which allows us to create processes or groups of processes which are virtually synchronous. All processes observe the same events, in the same order which amongst other things allows us to make assumptions about the actions other processes will take. The tools can be used as if they were synchronous.

Local Decisions

The advantage of focusing on making local decisions is that your program/application doesn't need to interact with any other services/applications before making a decision, therefore there is less delay.

Common Problems

We want async communication among processes, so without shared memory, our most general approach is messaging. This introduces it's own problems:

  • Message transmission times can vary
  • Messages from a single event can arrive at different processes at different times and in different order
  • Failure detection is based on timeouts, so hard to differentiate between system overload and a general transient system failure

A lot of distributed systems are based on RPC with timeouts. Transactions only solve part of the problem.

Problems Solved by Tools

Process Groups and Group Communication

We want to be able to structure a system into groups of processes which may not be identical. Processes should be able to belong to multiple groups and we need a mechanism to join and leave groups as well as being able to communicate with members of a group, even whilst membership is changing.

Request Strategy

There needs to be a strategy for processing requests by a process group. If possible, the response should be generated and send using only local information (see Local Decisions above).

Concurrency

Allow for multiple processes to be taking action at the same time as well as continuing execution after sending an async message.

Synchronization

Concurrently executing processes can need locks, which must also deal with the failure of a process which is holding the lock (using deadlock detection for example).

Replicated Data

Replication of data amongst co-operating processes.

Detecting and Reacting to Failure

Members of a group may need to be informed of member failured. We don't want messages from a failed process after we've figured out the process has failed.

Dynamic Reconfiguration

System configuration must be adjustable on outage or system load.

Stable Storage

To recover state after a failure, we need to be able to periodically checkpoint or use a recovery log/journal.

Recovery

Restart gracefully using stable state on complete system failure. Mechanisms for reincorporating a process which is coming back up after a partial failure - the rest of the system has been continuing to operate whilst it has been down, so now we also need to do things like rehydrate state.

Transactions

We need to be able to write to shared disk in a transactional manner.

Protections

The system must protect itself against actions by erroneous clients.

Consistency

The system should still be correct globally as long as we use locally correct components or algorithms.

Atomic Multicast

We could base such a system and tools on atomic multicast - this means that all destinations either recieve the message or all destinations don't recieve the message. We can't have 50% of the members recieving the message and the other 50% not.

We also need delivery ordering, so messages are seen in the same order by recipients.

This allows processes to keep a consistent view of one another - events like site failures, recoveries etc occur in the same order everywhere. The requirement of ordered delivery ends up being quite costly.

We can apply weaker delivery ordering semantics in situations where an application doesn't require ordering.

ABCAST Primitive

Messages are delivered atomically and in the same order everywhere.

CBCast Primitive

Ordering is preserved on a per-client or per-session basis.

Causally linked messages have guaranteed ordering. If an output message is the result of a particular input message (and associated computation) then recipients can not see the resulting output message before they see the input message which generated the output message.

GBCast Primitive

Use ABCAST to get a distributed lock for mutual exclusion on a resource, then use CBCAST to interact with that resource. Used for group addressing by the system itself.

https://www.cs.cornell.edu/home/rvr/sys/p123-birman.pdf - "Exploiting Virtual Synchrony in Distributed Systems", Kenneth P. Birman and Thomas A. Joseph, Department of Computer Science, Cornell University

https://people.cs.rutgers.edu/~pxk/417/notes/virtual_synchrony.html - Paul Krzyzanowski, Department of Computer Science, Rutgers

https://www.cs.cornell.edu/home/rvr/papers/StrongWeakVS.pdf - "Strong and Weak Virtual Synchrony in Horus", Friedman & Renesse, Department of Computer Science, Cornell University