Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - walk through a huge set of files but in a more efficient manner

Tags:

python

os.walk

I have huge set of files that I want to traverse through using python. I am using os.walk(source) for the same and is working but since I have a huge set of files it is taking too much and memory resources since its getting the complete list all at once. How can I optimize this to use less resources and may be walk through one directory at a time or in some other efficient manner and still able to iterate the complete set of files. Thanks

for dir, dirnames, filenames in os.walk(START_FOLDER): 
    for name in dirnames: 
        #if PRIVATE_FOLDER not in name: 
            for keyword in FOLDER_WITH_KEYWORDS_DELETION_EXCEPTION_LIST: 
                if keyword in name.lower(): 
                    ignoreList.append(name)
like image 761
nirvana Avatar asked Feb 12 '14 03:02

nirvana


People also ask

Is Scandir faster than Listdir?

Project description. scandir() is a directory iteration function like os. listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.


1 Answers

If the issue is that the directory simply has too many files in it, this will hopefully be solved in Python 3.5.

Until then, you may want to check out scandir.

like image 182
Ethan Furman Avatar answered Oct 13 '22 11:10

Ethan Furman