ACID Database Semantics
Written — Updated
- Changes within a transaction are all-or-nothing.
- This lets you make multiple changes without having to handle failure at each point.
- Postgres implements this by tracking, for each tuple, what are the first and last transactions that should be able to see it.
- When there are multiple versions of a tuple active at once, for ongoing transactions, it tracks each one and then as transactions commit these tuples become inactive and are later emptied out during vacuuming.
- Each transaction leaves the database in a valid state.
- There's no way to violate a
uniqueconstraint, for example.
- You can take advantage of this through
uniqueconstraints and also through
serializableisolation level transaction.
- Transactions work on snapshots and two transactions executing at the same time don't interfere with each other.
- There are a few different levels of isolation which exhibit different interaction phenomena.
- Dirty Read
- In this case, a read from one transaction can read data written by another ongoing transaction. In Postgres, this never actually happens, but the standard allows it in "read uncommitted" mode.
- Nonrepeatable Read
- When a transaction commits, other transactions then start to see that data. So a transaction may read the same data twice with different results if another transaction commits while it's running.
- Phantom Read
- Similar to nonrepeatable read, but affecting the set of rows returned by a query instead of just the data in those rows.
- Serialization Anomaly
- A successful group of transactions may leave the database in a state different from if they ran serially.
- Dirty Read
- Isolation Levels
- None of the interactions will occur. The result is the same as if all the transactions ran one at a time.
- As with Repeatable Read, the client should be prepared to retry the transaction.
- Repeatable Read
- By the standard, this allows Phantom Read and Serialization Anomaly interactions to occur.
- In Postgres, only only Serialization Anomaly will actually occur.
- Realistically, this is implemented by cancelling transactions that would otherwise violate this. The client can then retry the transaction.
- Read Committted
- Nonrepeatable Read, Phantom Read, and Serialization Anomaly can occur.
- In Postgres this is the default.
- Read Uncommitted
- All of these interactions are possible.
- In Postgres, Dirty Read never happens so this is actually the same as "Read Committed".
- There are performance implications to higher isolation levels, of course, but there are some ways to help here.
- Transactions can be declared as read only, which simplifies the tracking.
- Limit the number of connections.
- Transactions should not be long-running.
- Don't use
SELECT FOR UPDATEand similar locking techniques, inside serializable transactions because the transaction semantics already provide similar protections.
- Sequential scans greatly increase the risk of serialization failures, so queries should use index scans whenever possible.
- Committed transactions can not be lost. Instead of fsyncing all the time, this is often implemented using a WAL which is faster to write to, and that allows recovery of any committed data in the canonical data store that was lost in a crash or unexpected reboot.
Thanks for reading! If you have any questions or comments, please send me a note on Twitter. And if you enjoyed this, I also have a newsletter where I write about tech thoughts, interesting things I've read, and project updates each Thursday.