Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Workaround OSError with os.listdir

Tags:

python

macos

I have a directory with 90K files in it. This is such a preposterously huge number of files that bash functions like ls fail. So of course, does os.listdir() from my python (Mac Python, version 2.5) script; it fails with OSError: [Errno 12] Cannot allocate memory: '.'

People will say "Don't put that many files in a single directory! Are you crazy?" -- but I like to pretend that I live in the future, a brilliant, glowing place, where I have gigabytes of memory at my disposal, and don't need to worry too much about where exactly my files go, as long as there's rust left on my spinning platters.

So, is there a good workaround for this os.listdir() problem? I've considered just shelling out to find, but that's a bit gross, and unfortunately find is recursive, with no supported maxdepth option on Mac OS X 10.6.

Here's what the os.listdir via shelling out to find looks like, roughly:

def ls(directory): 
    import os
    files = os.popen4('find %s' % directory)[1].read().rstrip().split('\n')
    files.remove(directory)
    return files # probably want to remove dir prefix from everything in here too

Update: os.listdir() succeeds in python 2.6.

like image 455
Jason Sundram Avatar asked Nov 04 '10 16:11

Jason Sundram


1 Answers

You're hitting a historical artifact in Python: os.listdir should return an iterator, not an array. I think this function predates iterators--it's odd that no os.xlistdir has been added.

This has more effects than just memory usage on huge directories. Even on a directory with just a few thousand files, you're going to have to wait for the entire directory scan to complete, and you have to read the entire directory, even if the first entry is the one you were looking for.

This is a pretty glaring lack in Python: there appears to be no binding to the low-level opendir/readdir/fdopendir APIs, so it seems like it's not even possible to implement this yourself without writing a native module. This is one of those cases where it's such a huge, gaping hole in the standard library that I doubt myself and suspect I'm just not seeing it--there are low-level open, stat, etc. bindings, and this is in the same category.

like image 148
Glenn Maynard Avatar answered Oct 04 '22 21:10

Glenn Maynard