Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Read files by device/inode order?

I'm interested in an efficient way to read a large number of files on the disk. I want to know if I sort files by device and then by inode I'll got some speed improvement against natural file reading.

like image 484
Paulo Freitas Avatar asked Jan 20 '23 08:01

Paulo Freitas


1 Answers

There are vast speed improvements to be had from reading files in physical order from rotating storage. Operating system I/O scheduling mechanisms only do any real work if there are several processes or threads contending for I/O, because they have no information about what files you plan to read in the future. Hence, other than simple read-ahead, they usually don't help you at all.

Furthermore, Linux worsens your access patterns during directory scans by returning directory entries to user space in hash table order rather than physical order. Luckily, Linux also provides system calls to determine the physical location of a file, and whether or not a file is stored on a rotational device, so you can recover some of the losses. See for example this patch I submitted to dpkg a few years ago:

http://lists.debian.org/debian-dpkg/2009/11/msg00002.html

This patch does not incorporate a test for rotational devices, because this feature was not added to Linux until 2012:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=ef00f59c95fe6e002e7c6e3663cdea65e253f4cc

I also used to run a patched version of mutt that would scan Maildirs in physical order, usually giving a 5x-10x speed improvement.

Note that inodes are small, heavily prefetched and cached, so opening files to get their physical location before reading is well worth the cost. It's true that common tools like tar, rsync, cp and PostgreSQL do not use these techniques, and the simple truth is that this makes them unnecessarily slow.

like image 151
mortehu Avatar answered Jan 31 '23 08:01

mortehu