I have huge meteorological files. Too big for fitting in ram.
I need to perform a lot of concurrent random reads. So, I think SSD + mmap could improve performance.
But what's about concurrent mmap reads ? How should they be organized ?
Is there a concurrency reason (contention for data structures and resources shared between threads) why you would want to open the same files independently in different threads? If no, then I can't see a reason for doing that. It will just make the kernel work a little harder by having to track a bunch of different memory mappings (one for each thread) that all ultimately map to the same object, consume more file descriptors (no big deal unless you have a very large number of files), and consume more address space when you mmap the same files multiple times.
If I understand that in your scenario the files are mostly open infrequently, read a lot, and then closed infrequently, I don't think you would have much contention between threads. So go with opening files globally for all threads.
Regardless of whether you have contention between threads for the housekeeping of the open files, there is one overriding reason in favor of mapping each file only once per process, and that is if your address spare is only 32 bits. If you are in 32 bit mode then address space is quite a limited resource if your files are large and you want to mmap significant portions of them. In that case you most certainly need to conserve address space by not wastefully mapping the same file twice in two different threads.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With