mmap( ) vs read( )

Tags:

I'm writing a bulk ID3 tag editor in C. ID3 tags are usually at the beginning of an mp3 encoded file, although older (version 1) tags are at the end. The app is designed to accept a directory and frame ID list from the command line, then recurse the directory structure updating all the ID3 tags it finds. The user may additionally choose to remove all older (version 1) tags. Another option is to simply display the current tags, without performing an update. The directory might contain 2 files or 2 million. If the user means to update the files, I was planning to load the entire file into memory, perform the updates, then save it (the file may be renamed as well). However, if the user only means to print the current ID3 tags, then loading the entire file seems excessive. After all the file could be 200mb.

I've read through this thread, which was insightful - mmap() vs. reading blocks

So my question is, what the most efficient way to go about this -- read(), mmap() or some combination? Design ideas welcome.

Edit: It's my understanding that mmap essentially delegates loading a file into memory, to the virtual memory subsystem. It seems to me, the VMM would be highly optimized on most systems as it's critical for system performance.

539

asked Apr 07 '11 22:04

J. Andrew Laughlin

2 Answers

It really depends on what you're trying to do. If all you need to do is hop to a known offset and read out a small tag, read() may be faster (mmap() has to do some rather complex internal accounting). If you are planning on copying out all 200mb of the MP3, however, or scanning it for some tag that may appear at an unknown offset, then mmap() is likely a faster approach.

For example, if you need to shift the entire file down a few hundred bytes in order to insert an ID3 tag, one simple approach would be to expand the file with ftruncate(), mmap the file, then memmove() the contents down a bit. This, however, will destroy the file if your program crashes while it's running. You could also copy the contents of the file into a new file - this is another place where mmap() really shines; you can simply mmap() the old file, then copy all of its data into the new file with a single write().

In short, mmap() is great if you're doing a large amount of IO in terms of total bytes transferred; this is because it reduces the number of copies needed, and can significantly reduce the number of kernel entries needed for reading cached data. However mmap() requires a minimum of two trips into the kernel (three if you clean up the mapping when you're done!) and does some complex internal kernel accounting, and so the fixed overhead can be high.

read() on the other hand involves an extra memory-to-memory copy, and can thus be inefficient for large I/O operations, but is simple, and so the fixed overhead is relatively low. In short, use mmap() for large bulk I/O, and read() or pread() for one-off, small I/Os.

answered Dec 18 '22 04:12

bdonlan

Don't bother with mmap unless your code is CPU bound, specifically due to lots small reads and writes. mmap may sound nice, but it isn't the awesome why isn't everyone using this alternative it looks like.

Given that you're recursing through potentially large directory structures, your bottleneck will be directory IO and concurrency. mmap is not going to help.

Update0

Reading the linked to question finds this answer that supports my experience:

mmap() vs. reading blocks

answered Dec 18 '22 06:12

Matt Joiner

Related questions
                            
                                Is there a use for function declarations inside functions?
                            
                                To find largest element smaller than K in a BST
                            
                                How to convert negative zero to positive zero in C?
                            
                                Do C & C++ compilers optimize comparisons with function calls?
                            
                                Faster way to move memory page than mremap()?
                            
                                Does Linux kernel have main function?
                            
                                declare extern variable within a C function?
                            
                                execute binary machine code from C
                            
                                What is the purpose of format specifier "%qd" in `printf()`?
                            
                                Get notified about network interface change on Linux
                            
                                Code for malloc and free
                            
                                How can floating point calculations be made deterministic?
                            
                                Why I can't read openssl generated RSA pub key with PEM_read_RSAPublicKey?
                            
                                How do you pre-allocate space for a file in C/C++ on Windows?
                            
                                How can I cause an instruction cache miss?
                            
                                Source file not compiled Dev C++
                            
                                How does a C parser distinguish between a type cast and a function call in general?
                            
                                Does abs(unsigned long) make any sense?
                            
                                Does C99 guarantee that arrays are contiguous?
                            
                                Where can I start with programmable Hardware?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With