Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve performance of file reading by multiple threads?

Tags:

c

linux

unix

I need to read a single file using multiple threads under Linux. There are reading operations only and no need of writing. The file reading don't need read the whole file every time. It need read one or more portions of a file every time. I store the offset of each portion beforehand. The file is too large to put into main memory.

So for example, many users want to read such file. I use a thread or a process to read the file to answer user requests. What will happen under Linux? Will all the read operations be queued? And the OS will complete the file reading one by one? Is it possible to improve the performance of such operations?

I'm trying to implement a simple inverted index used in information retrieval. I put dictionary in memory and posting lists in files. Each file contains a segment of the index. In the dictionary, I can store something like offset to point to the position of the word's posting list. When 100 users want to search something in one second, they submit different queries. So each reading will read different part of the file.

like image 711
Stephen Hsu Avatar asked Feb 28 '23 06:02

Stephen Hsu


2 Answers

Try to implement it in the simplest possible way to start with - let the OS deal with making it efficient by caching etc. See what the performance is like - it may well not turn out to be the bottleneck at all. OSes are generally good at this sort of thing :)

Assuming you are able to open the file multiple times for shared reading, I'd expect it to work fine, without all the read operations being queued.

like image 74
Jon Skeet Avatar answered Mar 05 '23 15:03

Jon Skeet


How big is your file that it won't all fit in memory?

It would be most efficient to punt to the o/s, and use mmap() to map the file into (virtual) memory, and then let the threads all access the file via memory. If you're on a 32-bit machine, that limits your file size to 'something under 4GB, but probably well over 2 GB'; if you're on a 64-bit machine, you aren't really limited except by disk space.

Note that the file need not all be in physical memory with mmap(); however, it will all be there logically.

like image 43
Jonathan Leffler Avatar answered Mar 05 '23 15:03

Jonathan Leffler