Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Guarantees in write ahead logging implementation

If one were to issue a sequential series of write(2) in Linux/Unix seperated by fdatasync(2) or fsync(2) or sync(2) is it guaranteed that the first write() will be committed to disk before your second write()? The following SO post seems to say that such guarantees cannot be given, since there are multiple caching layers involved. For database systems which guarantee consistency this seems to be important, since in WAL (Write Ahead Logging) recovery, you'd need your logs to be persisted on disk before actually changing your data, so that in the event of an application/system failure you can revert back to your last known consistent state. How is this ensured/implemented in an actual database system?

like image 931
pjay Avatar asked May 24 '12 04:05

pjay


People also ask

How do you implement ahead logging?

The Write Ahead Log Durability is provided by writing the intended mutation to the WAL first, before applying the changes to for example, the in-memory representation. By writing to the WAL first, should the database then crash, we will be able to recover the mutation and reapply if necessary.

What is the purpose of write ahead logging?

The Write Ahead Logging (WAL) technique is a popular method among database users to preserve the atomicity and durability of their data writes. This technique operates on the concept of logging your data writes in secure storage prior to making any permanent changes in your database.

What is write ahead logging in SQL Server?

SQL Server uses a write-ahead logging (WAL) algorithm, which guarantees that no data modifications are written to disk before the associated log record is written to disk. This maintains the ACID properties for a transaction.

What is WAL file?

The write-ahead log or "wal" file is a roll-forward journal that records transactions that have been committed but not yet applied to the main database. Details on the format of the wal file are describe in the WAL format subsection of the main file format document.


1 Answers

The sync() system call is practically no help whatsoever; it promises to schedule the write-to-disk operations, but that's about all.

The normal technique used is to set the correct options when you open() the file descriptor for the disk file: O_DSYNC, O_RSYNC, O_SYNC. However, the fsync() and fdatasync() get pretty close to the same effects. You can also look at O_DIRECTIO which is often supported, though it is not standardized at all by POSIX.

Ultimately, the DBMS relies on the O/S to undertake that data written and synchronized to one disk is secure. As long as the device will always return what the DBMS last wrote, even if it is not on actual disk yet because of caching (because it is backed up in non-volatile cache, or something like that), then it isn't critical. If, on the other, you have NAS (network attached storage) that doesn't guarantee that what you last wrote (and were told was safe on disk) is returned when you read it, then your DBMS can suffer if it has to do recovery. So, you choose where you store your DBMS with care, making sure the storage works sensibly. If the storage does not work sufficiently like the hypothetical disk, you can end up losing data.

like image 186
Jonathan Leffler Avatar answered Oct 13 '22 11:10

Jonathan Leffler