Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compression to Improve Hard Disk Write Performance

On a modern system can local hard disk write speeds be improved by compressing the output stream?

This question derives from a case I'm working with where a program serially generates and dumps around 1-2GB of text logging data to a raw text file on the hard disk and I think it is IO bound. Would I expect to be able to decrease runtimes by compressing the data before it goes to disk or would the overhead of compression eat up any gain I could get? Would having an idle second core affect this?

I know this would be affected by how much CPU is being used to generate the data so rules of thumb on how much idle CPU time would be needed would be good.


I recall a video talk where someone used compression to improve read speeds for a database but IIRC compressing is a lot more CPU intensive than decompressing.

like image 508
BCS Avatar asked Jan 10 '09 19:01

BCS


1 Answers

Yes, yes, yes, absolutely.

Look at it this way: take your maximum contiguous disk write speed in megabytes per second. (Go ahead and measure it, time a huge fwrite or something.) Let's say 100mb/s. Now take your CPU speed in megahertz; let's say 3Ghz = 3000mhz. Divide the CPU speed by the disk write speed. That's the number of cycles that the CPU is spending idle, that you can spend per byte on compression. In this case 3000/100 = 30 cycles per byte.

If you had an algorithm that could compress your data by 25% for an effective 125mb/s write speed, you would have 24 cycles per byte to run it in and it would basically be free because the CPU wouldn't be doing anything else anyway while waiting for the disk to churn. 24 cycles per byte = 3072 cycles per 128-byte cache line, easily achieved.

We do this all the time when reading optical media.

If you have an idle second core it's even easier. Just hand off the log buffer to that core's thread and it can take as long as it likes to compress the data since it's not doing anything else! The only tricky bit is you want to actually have a ring of buffers so that you don't have the producer thread (the one making the log) waiting on a mutex for a buffer that the consumer thread (the one writing it to disk) is holding.

like image 180
Crashworks Avatar answered Sep 16 '22 15:09

Crashworks