sqweek's comments

sqweek · 2025-07-28T02:46:07 1753670767

> This is not "data loss", because the transaction was not ever fully committed. The power failure happened before the commit was confirmed to the application, so there's no way anyone should have expected that the transaction is durable.

In the scenario outlined in the article, technically the lost transactions _were_ fully committed from the application's perspective.

From sqlite's perspective, the updates were successfully fsync'd to the WAL and are now waiting for the next CHECKPOINT operation to be written back to the main database. But sqlite doesn't wait for the main DB to be updated to report success to the application, and nor should it -- the entire point of the WAL is that once data is fsynced to the journal sqlite is confident about its durability.

The type of corruption induced in the article highlights this assumption: if the data on disk changes after fsync reports success, it can invalidate updates further ahead in the WAL that the application was previously told had been successfully committed (and thus may have triggered other external actions).

To be clear, sqlite is doing the right thing. If any frame in the WAL does not match its checksum then the validity of all subsequent frames is called into question. And if fsync() is lying to this extent there's nothing sqlite can do -- that's a bug in the underlying storage layer.

However I think it does leave us with a legitimate race:

1. application submits transactions 1..N 2. sqlite starts committing transactions 1..N to WAL 3. fsync reports success and data is valid on disk 4. sqlite reports success to application 5. application fires off actions in response to transaction N being committed 6. some external event (hardware?) causes a bit-flip in the WAL on disk affecting transaction 2 7. application closes/crashes without sqlite having an opportunity to CHECKPOINT 8. application restarts 9. sqlite recovers WAL and discards transactions 2..N

There's a lot of caveats here. I think that both the bit-flip and the application crash are required, because while the application is running the WAL contents are likely duplicated in RAM (possibly in OS-buffers) and a bit-flip on disk alone may never be observed. The bit-flip also needs to be a silent error from the perspective of the storage layer to not result in an error/warning message from sqlite (eg. reporting I/O errors via the usual sqlite_config(SQLITE_LOG, ...) mechanism).

Finally if you're considering this kind of data corruption, there's no need to involve the WAL at all. The same kind of silent bit-flip could equally affect the main database, randomly changing the contents of an arbitrary cell in the database. Sqlite _might_ detect that and report corruption (if it results in violating an index ordering or some other constraint), or it might just pass the data on. But as you summarised:

> Myriad other mechanisms should be protecting against [damage to the disk or a buggy app overwriting bytes] already, and sqlite is assuming those other mechanisms are working, because if not, there's very little sqlite can do about it.

tldr; the type of corruption simulated in the article is quite contrived, and sqlite does not protect against cosmic rays/subtle changes on disk to its database files

sqweek · on Jan 25, 2009

http://glendix.org