Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is os.walk() much faster after the first run due to page caching?

Tags:

python

caching

I use os.walk to iterate over, say, 1000 files (just iteration, no process is done on these files). The first run is slow, but subsequent runs (on the same path) are about 20 times faster.

As far as I know, os.walk and os.listdir (which is used by os.walk) didn't do any caching, nor the FindFirstFile/FindNextFile (which is used by os.listdir on my Windows platform).

So is this due to page caching or some thing else?

FYI, I'm trying to write a backup application and need to process huge number of files. If it's indeed due to page caching, then I'll need to write my own caching mechanism.

like image 253
fans656 Avatar asked Dec 24 '22 23:12

fans656


1 Answers

Your OS does the caching here; directory lookups require disk access which is slow, so such access is heavily cached.

For example, the ntfs.sys driver uses the Data Map service to cache filesystem metadata such as directory listings.

like image 114
Martijn Pieters Avatar answered Feb 23 '23 11:02

Martijn Pieters