Skip to main content

ACID BASE and HAT

· 7 min read
Sanjeev Sarda
High Performance Developer

Some notes on ACID, BASE and HAT.

Distributed systems

ACID

Defines the desirable qualities of a traditional transactional database system that provides data validity in the presence of errors:

  • Atomicity - single unit transactions.

  • Consistency - preserve invariants upon transactions.

  • Isolation - as if serial behaviour despite concurrent transactions. The amount of data that is visible in a transaction when the other services access the same data simultaneously.

  • Durability - transactions once commited remain despite system failure.

ACID prioritizes consistency over availability. Entire transactions can fail if at any step there is an error, so the db is left in a consistent state in the event of any errors.

BASE

  • Basically available - data is available most of the time even with partial system failure

  • Soft state - replicas are not always consistent, the system allows for temporarily inconsistent replicas.

  • Eventually consistent - no guarantee when the data will become eventually consistent.

HAT

Highly Available Transactions.

System that remains operational under partition. A system is considered highly available if every transaction that can connect to at least one server for each item in the transaction eventually commits.

Eventual commit for every transaction that can contact at least one sever in the system.

https://jepsen.io/consistency

Jepsen consistency diagram

Strict Serializability

Strict 1SR. Linearizable across multiple objects with process order.

Serializable

Doesn't impose a per process order between transactions - can observe a write, then fail to see that write in a subsequent transaction.

Linearizability

Operations appear to have occured in some real time order. Operations are atomic. Applies to a single object.

reads will return the last completed write to a data item

Causal Consistency

Causally related operations should appear in the same order e.g. publisher 1 request, 2 responders - the request is always seen before any of the responses. Ordering across/between entities.

PRAM

Pipeline Random Access Memory.

You see your own 2 writes in the order you wrote them, but you may not see the correct ordering from other writers. Session order is a subset of visibility order.

Writes Follow Reads

Session causality. You write "value", a process reads "value" then does some other write, let's call it "second value". The write of "second value" must be visible after the write of "value".

Monotonic Reads

Reads cant go backwards. You did some writes which created the state for r1. You do a read r1, then r2. The read at r2 cant observe state prior to r1.

Monotonic Writes

Writes by the same process are observed in order.

Dimensions of Consistency

  • Same process or multiple processes
  • Same node or all nodes
  • Single object or multi entity
  • Visibility of ordering across multiple processes, or a single process
  • When do writes become visible to readers

"Highly Available Transactions: Virtues and Limitations Peter Bailis, Aaron Davidson, Alan Fekete† , Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica UC Berkeley and †University of Sydney"

https://www.vldb.org/pvldb/vol7/p181-bailis.pdf

Semantics, taxonomy + experimental results.

Write Skew

In a write skew anomaly, two transactions (T1 and T2) concurrently read an overlapping data set (e.g. values V1 and V2), concurrently make disjoint updates (e.g. T1 updates V1, T2 updates V2), and finally concurrently commit, neither having seen the update performed by the other.

Snapshot Isolation

All reads in a transaction see a consistent snapshot of the database, the transaction successfully commits only if no intermediate conflicting concurrent updates (CAS style).

In practice snapshot isolation is implemented within multiversion 
concurrency control (MVCC), where generational values of each data
item (versions) are maintained: MVCC is a common way to increase
concurrency and performance by generating a new version of a
database object each time the object is written, and
allowing transactions' read operations of several
last relevant versions (of each object)

Stickiness

"We say that a system provides sticky availability if, whenever 
a client’s transactions is executed against a copy of
database state that reflects all of the client’s prior
operations, it eventually receives a response, even
in the presence of indefinitely long partitions

Bailis Definition of HA

If you can connect to a non failing node and send a message, you're guaranteed a reply.

Replica Availability

A transaction can contact at least one replica for each item it tries to access.

Transactional Availability

Given replica availability, the transaction eventually commits or aborts.

Read Uncommitted

Writes to an object are totally ordered.

Read Uncommitted is easily achieved
by marking each of a transaction’s writes with the same timestamp
(unique across transactions; e.g., combining a client’s ID with a
sequence number) and applying a “last writer wins”
conflict reconciliation policy at each replica

Read Committed

Transactions shouldnt access uncommitted or intermediate versions of data/items. Prevents dirty writes and dirty reads.

It is fairly easy for a HAT system to prevent “Dirty Reads”: if each
client never writes uncommitted data to shared copies of data, then
transactions will never read each others’ dirty data. As a simple
solution, clients can buffer their writes until they commit, or,
alternatively, can send them to servers, who will not deliver their
value to other readers until notified that the writes have been committed.

Repeatable Read

Read from a snapshot.

It is possible to satisfy Item Cut Isolation with high availability by
having transactions store a copy of any read data at the client such
that the values that they read for each item never changes unless
they overwrite it themselves. These stored values can be discarded
at the end of each transaction and can alternatively be accomplished
on (sticky) replicas via multi-versioning

Monotonic Atomic View

Under MAV, once some of the effects of a transaction Ti are 
observed by another transaction Tj , thereafter, all
effects of Ti are observed by Tj .

Timestamp on writes. Reader uses timestamp as last seen sequence number, gets a return value that is at that sequence or higher.

Monotonic Reads + Monotonic Writes = Ordering

Writes Follow Reads -> Addition of causality across sessions

The above guarantees can be achieved by forcing servers to wait
to reveal new writes (say, by buffering them in separate local
storage) until each write’s respective dependencies are visible on all
replicas.

Sticky Availability

  • Read your writes
  • PRAM
  • Causal consistency (MAV)

Lost Updates

Berenson et al. define Lost Update as when one transaction T1
reads a given data item, a second transaction T2 updates the same
data item, then T1 modifies the data item based on its original read
of the data item, “missing” or “losing” T2’s newer update.
Consider a database containing only the following transactions:

T1 ∶ rx(a) wx(a + 2)
T2 ∶ wx(2)

Write Skew

It occurs when one transaction T1 reads a given data item x, a
second transaction T2 reads a different data item y, then T1 writes
to y and commits and T2 writes to x and commits. As an example
of Write Skew, consider the following two transactions:

Reading a variable then committing to a different variable being read by another thread. The other thread is committing to the variable you read but reading the one you're trying to commit too.

https://muratbuffalo.blogspot.com/2022/11/highly-available-transactions-virtues.html