Why do SQL databases use a write-ahead log over a command log?

Tags:

I read about Voltdb's command log. The command log records the transaction invocations instead of each row change as in a write-ahead log. By recording only the invocation, the command logs are kept to a bare minimum, limiting the impact the disk I/O will have on performance.

Can anyone explain the database theory behind why Voltdb uses a command log and why the standard SQL databases such as Postgres, MySQL, SQLServer, Oracle use a write-ahead log?

529

asked Jan 06 '13 10:01

user782220

1 Answers

I think it is better to rephrase:

Why does new distributed VoltDB use a command log over write-ahead log?

Let's do an experiment and imagine you are going to write your own storage/database implementation. Undoubtedly you are advanced enough to abstract a file system and use block storage along with some additional optimizations.

Some basic terminology:

State : stored information at a given point of time
Command : directive to the storage to change its state

So your database may look like the following:

enter image description here

Next step is to execute some command:

enter image description here

Please note several important aspects:

A command may affect many stored entities, so many blocks will get dirty
Next state is a function of the current state and the command

Some intermediate states can be skipped, because it is enough to have a chain of commands instead.

enter image description here

Finally, you need to guarantee data integrity.

Write-Ahead Logging - central concept is that State changes should be logged before any heavy update to permanent storage. Following our idea we can log incremental changes for each block.
Command Logging - central concept is to log only Command, which is used to produce the state.

enter image description here

There are Pros and Cons for both approaches. Write-Ahead log contains all changed data, Command log will require addition processing, but fast and lightweight.

VoltDB: Command Logging and Recovery

The key to command logging is that it logs the invocations, not the consequences, of the transactions. By recording only the invocation, the command logs are kept to a bare minimum, limiting the impact the disk I/O will have on performance.

Additional notes

SQLite: Write-Ahead Logging

The traditional rollback journal works by writing a copy of the original unchanged database content into a separate rollback journal file and then writing changes directly into the database file.

A COMMIT occurs when a special record indicating a commit is appended to the WAL. Thus a COMMIT can happen without ever writing to the original database, which allows readers to continue operating from the original unaltered database while changes are simultaneously being committed into the WAL.

PostgreSQL: Write-Ahead Logging (WAL)

Using WAL results in a significantly reduced number of disk writes, because only the log file needs to be flushed to disk to guarantee that a transaction is committed, rather than every data file changed by the transaction.

The log file is written sequentially, and so the cost of syncing the log is much less than the cost of flushing the data pages. This is especially true for servers handling many small transactions touching different parts of the data store. Furthermore, when the server is processing many small concurrent transactions, one fsync of the log file may suffice to commit many transactions.

Conclusion

Command Logging:

is faster
has lower footprint
has heavier "Replay" procedure
requires frequent snapshot

Write Ahead Logging is a technique to provide atomicity. Better Command Logging performance should also improve transaction processing. Databases on 1 Foot

enter image description here

Confirmation

VoltDB Blog: Intro to VoltDB Command Logging

One advantage of command logging over ARIES style logging is that a transaction can be logged before execution begins instead of executing the transaction and waiting for the log data to flush to disk. Another advantage is that the IO throughput necessary for a command log is bounded by the network used to relay commands and, in the case of Gig-E, this throughput can be satisfied by cheap commodity disks.

It is important to remember VoltDB is distributed by its nature. So transactions are a little bit tricky to handle and performance impact is noticeable.

VoltDB Blog: VoltDB’s New Command Logging Feature

The command log in VoltDB consists of stored procedure invocations and their parameters. A log is created at each node, and each log is replicated because all work is replicated to multiple nodes. This results in a replicated command log that can be de-duped at replay time. Because VoltDB transactions are strongly ordered, the command log contains ordering information as well. Thus the replay can occur in the exact order the original transactions ran in, with the full transaction isolation VoltDB offers. Since the invocations themselves are often smaller than the modified data, and can be logged before they are committed, this approach has a very modest effect on performance. This means VoltDB users can achieve the same kind of stratospheric performance numbers, with additional durability assurances.

151

answered Oct 11 '22 01:10

Renat Gilmanov

Related questions
                            
                                How to get data by SqlDataReader.GetValue by column name
                            
                                JDBC ResultSet: I need a getDateTime, but there is only getDate and getTimeStamp
                            
                                SQL: Combine Select count(*) from multiple tables
                            
                                How to list custom types using Postgres information_schema
                            
                                How to store MySQL query results in another Table?
                            
                                Store mysql query output into a shell variable
                            
                                Handling inheritance with overriding efficiently
                            
                                Android phonegap application having issues with SQlite and local storage on Samsung Galaxy devices
                            
                                What SQLite column name can be/cannot be?
                            
                                alter the size of column in table containing data [duplicate]
                            
                                Create Local SQL Server database
                            
                                When running UPDATE ... datetime = NOW(); will all rows updated have the same date/time?
                            
                                Is it possible to insert data into a MySQL view?
                            
                                Calling a stored procedure in Oracle with IN and OUT parameters
                            
                                SQL performance on LEFT OUTER JOIN vs NOT EXISTS
                            
                                Passing List<> to SQL Stored Procedure
                            
                                How can I force a query to not use a index on a given table?
                            
                                How to return multiple values in one column (T-SQL)?
                            
                                When should I use primary key or index?
                            
                                What's the best way to store the days of the week an event takes place on in a relational database?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do SQL databases use a write-ahead log over a command log?

Tags:

sql

database

logging

transactions

voltdb

user782220

People also ask

1 Answers

Renat Gilmanov

Recent Activity

Donate For Us