Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Advantages of mmap vs fileinput

I read that mmap is advantageous than fileinput, because it will read a page into kernel pagecache and shares the page in user address space. Whereas, fileinput actually brings a page into kernel and copies a line to user address space. So, there is this extra space overhead with fileinput.

So, I am planning to move to mmap, but I want to know from advanced python hackers whether it improves performance?

If so, is there a similar implementation of fileinput that uses mmap?

Please point me to any opensource code, if you are aware of.

thank you

like image 593
Boolean Avatar asked Nov 14 '22 01:11

Boolean


1 Answers

mmap takes a file and sticks it in RAM so that you can index it like an array of bytes or as a big data structure.

Its a lot faster if you are accessing your file in a "random-access" manner -- that is doing a lot of fseek(), fread(), fwrite() combinations.

But if you are just reading the file in and processing each line once (say), then it is unlikely to be significantly faster. In fact, for any reasonable file size (remember with mmap it all must fit in RAM -- or paging occurs which begins to reduce the efficiency of mmap) it probably is indistinguishable.

like image 137
AndrewStone Avatar answered Dec 25 '22 13:12

AndrewStone