ACID Database Semantics

Written — Updated
  • Atomicity
    • Changes within a transaction are all-or-nothing.
    • This lets you make multiple changes without having to handle failure at each point.
    • Postgres implements this by tracking, for each tuple, what are the first and last transactions that should be able to see it.
      • When there are multiple versions of a tuple active at once, for ongoing transactions, it tracks each one and then as transactions commit these tuples become inactive and are later emptied out during vacuuming.
  • Consistency
    • Each transaction leaves the database in a valid state.
    • There's no way to violate a unique constraint, for example.
    • You can take advantage of this through unique constraints and also through serializable isolation level transaction.
  • Isolation
    • Transactions work on snapshots and two transactions executing at the same time don't interfere with each other.
    • There are a few different levels of isolation which exhibit different interaction phenomena.
    • Interactions
      • Dirty Read
        • In this case, a read from one transaction can read data written by another ongoing transaction. In Postgres, this never actually happens, but the standard allows it in "read uncommitted" mode.
      • Nonrepeatable Read
        • When a transaction commits, other transactions then start to see that data. So a transaction may read the same data twice with different results if another transaction commits while it's running.
      • Phantom Read
        • Similar to nonrepeatable read, but affecting the set of rows returned by a query instead of just the data in those rows.
      • Serialization Anomaly
        • A successful group of transactions may leave the database in a state different from if they ran serially.
    • Isolation Levels
      • Serializable
        • None of the interactions will occur. The result is the same as if all the transactions ran one at a time.
        • As with Repeatable Read, the client should be prepared to retry the transaction.
      • Repeatable Read
        • By the standard, this allows Phantom Read and Serialization Anomaly interactions to occur.
        • In Postgres, only only Serialization Anomaly will actually occur.
        • Realistically, this is implemented by cancelling transactions that would otherwise violate this. The client can then retry the transaction.
      • Read Committted
        • Nonrepeatable Read, Phantom Read, and Serialization Anomaly can occur.
        • In Postgres this is the default.
      • Read Uncommitted
        • All of these interactions are possible.
        • In Postgres, Dirty Read never happens so this is actually the same as "Read Committed".
    • There are performance implications to higher isolation levels, of course, but there are some ways to help here.
      • Transactions can be declared as read only, which simplifies the tracking.
      • Limit the number of connections.
      • Transactions should not be long-running.
      • Don't use SELECT FOR UPDATE and similar locking techniques, inside serializable transactions because the transaction semantics already provide similar protections.
      • Sequential scans greatly increase the risk of serialization failures, so queries should use index scans whenever possible.
  • Durability
    • Committed transactions can not be lost. Instead of fsyncing all the time, this is often implemented using a WAL which is faster to write to, and that allows recovery of any committed data in the canonical data store that was lost in a crash or unexpected reboot.

Thanks for reading! If you have any questions or comments, please send me a note on Twitter. This page was generated by the Roam Research Note Exporter alpha test.