Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does os.listdir() performs on very large folders?

I plan on getting a huge folder of data. The total size of the folder would be approximately 2TB and it would be comprised of about 2 million files. I will need to perform some processing on those files (mainly removing 99% of them).

I anticipate some issues due to the size of the data. In particular, I would like to know if Python is able to list these files correctly using os.listdir() in a reasonable time.

For instance, I know from experience that in some cases, deleting huge folders like this one on Ubuntu can be painful.

like image 968
Joseph Budin Avatar asked Oct 24 '25 03:10

Joseph Budin


1 Answers

os.scandir was created largely because of issues with using os.listdir on huge directories, so I would expect os.listdir to suffer in the scenario you describe, where os.scandir should perform better, both because it can process the folders with lower memory consumption and because (typically) you benefit at least a little by avoiding per-entry stat calls (e.g. to distinguish files from directories).

like image 89
ShadowRanger Avatar answered Oct 26 '25 22:10

ShadowRanger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!