Background:
I'm developing a database related program, and I need to flush dirty metadata from memory to disk sequentially. /dev/sda1 is volumn format, so data on /dev/sda1 will be accessed block by block and the blocks are adjacent physically if accessed sequentially. And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.
Problems:
After opening /dev/sda1, I'll read one block, update the block and write the block back to the same offset from the beginning of /dev/sda1, iteratively.
The code are like below -
//block_size = 256KB
int file = open("/dev/sda1", O_RDWR|O_LARGEFILE|O_DIRECT);
for(int i=0; i<N; i++) {
pread(file, buffer, block_size, i*block_size);
// Update the buffer
pwrite(file, buffer, block_size, i*block_size);
}
I found that if I don't do pwrite, read throughput is 125 MB/s.
If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.
If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.
I also tried read()/write() and aio_read()/aio_write(), but the problem remains. I don't know why write after read at the same position of a file will make the read throughput so low.
If accessing more blocks at a time, like this
pread(file, buffer, num_blocks * block_size, i*block_size);
The problem will mitigate, please see the chart.
And I use direct I/O, so the I/O will bypass the caching mechanism of the file system and access directly the blocks on the disk.
If you don't have file system on the device and directly using the device to read/write, then there is no file system cache comes into the picture.
The behavior you observed is typical of disk access and IO behavior.
I found that if I don't do pwrite, read throughput is 125 MB/s
Reason: The disk just reads data, it doesn't have to go back to the offset and write data, 1 less operation.
If I do pwrite, read throughput will be 21 MB/s, and write throughput is 169 MB/s.
Reason: Your disk might have better write speed, probably disk buffer is caching write rather than directly hitting the media.
If I do pread after pwrite, write throughput is 115 MB/s, and read throughput is 208 MB/s.
Reason: Most likely data written is being cached at disk level and so read gets data from cache instead of media.
To get optimal performance, you should use asynchronous IOs and number of blocks at a time. However, you have to use reasonable number of blocks and can't use very large number. Should find out what is optimal by trial and error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With