How is fseek() implemented in the filesystem?

Tags:

This is not a pure programming question, however it impacts the performance of programs using fseek(), hence it is important to know how it works. A little disclaimer so that it doesn't get closed.

I am wondering how efficient it is to insert data in the middle of the file. Supposing I have a file with 1MB data and then I insert something at the 512KB offset. How efficient would that be compared to appending my data at the end of the file? Just to make the example complete lets say I want to insert 16KB of data.

I understand the answer varies depending on the filesystem, however I assume that the techniques used in common filesystems are quite similar and I just want to get the right notion of it.

655

asked Mar 13 '10 15:03

pajton

2 Answers

(disclaimer: I want just to add some hints to this interesting discussion) IMHO there are some things to take into account:

1) fseek is not a primary system service, but a library function. To evaluate its performance we must consider how the file stream library is implemented. In general, the file I/O library adds a layer of buffering in user space, so the performance of fseek may be quite different if the target position is inside or outside the current buffer. Also, the system services that the I/O libary uses may vary a lot. I.e. on some systems the library uses extensively the file memory mapping if possible.

2) As you said, different filesystems may behave in a very different way. In particular, I would expect that a transactional filesystem must do something very smart and perhaps expensive to be prepared to a possible rollback of an aborted write operation in the middle of a file.

3) Modern OS'es have very aggressive caching algorithms. An "fseeked" file is likely to be already present in cache, so operations become much faster. But they may degrade a lot if the overall filesystem activity produced by other processes become important.

Any comments?

117

answered Oct 11 '22 03:10

Giuseppe Guerrini

Let us assume the ext2 FS and the Linux OS as an example. I don't think there will be a significant performance difference between a insert and an append. In both cases the files node and offset table must be read, the relevant disk sector mapped into memory, the data updated and at some later point the data written back to disk. What will make a big performance difference in this example is good temporal and spatial locality when accessing parts of the file since this will reduce the number of load/store combos.

As a previous answers says you may be able to speed up both operations if you deal with data writes that exact multiples of the FS block size, in this case you could skip the load stage and just insert the new blocks into the files inode datastrucure. This would not be practical, as you would need low level access to the FS driver, and using it would be very restrictive and not portable.

answered Oct 11 '22 01:10

PinkyNoBrain

Related questions
                            
                                False sense of security with `snprintf_s`
                            
                                What was Tim Sweeney thinking? (How does this C++ parser work?)
                            
                                What is the best place to display the language in your URL?
                            
                                android service exported attribute?
                            
                                Python: Is this an ok way of overriding __eq__ and __hash__?
                            
                                Visual Studio Free addin or resharper plugin to show constant value in tooltip
                            
                                Determining location of JVM executable during runtime
                            
                                Are you able to use a custom Postgres comparison function for ORDER BY clauses?
                            
                                Why is the endptr parameter to strtof and strtod a pointer to a non-const char pointer?
                            
                                Can maximum number of characters be defined in C# format strings like in C printf?
                            
                                How to save a jQuery FLOT Graph to a .png or other image format?
                            
                                HttpWebRequest cookie with empty domain

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With