Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count number of files with certain extension in Python

Tags:

python

file

count

I am fairly new to Python and I am trying to figure out the most efficient way to count the number of .TIF files in a particular sub-directory.

Doing some searching, I found one example (I have not tested), which claimed to count all of the files in a directory:

file_count = sum((len(f) for _, _, f in os.walk(myPath)))

This is fine, but I need to only count TIF files. My directory will contain other files types, but I only want to count TIFs.

Currently I am using the following code:

tifCounter = 0
for root, dirs, files in os.walk(myPath):
    for file in files:    
        if file.endswith('.tif'):
            tifCounter += 1

It works fine, but the looping seems to be excessive/expensive to me. Any way to do this more efficiently?

Thanks.

like image 314
Bryan Lewis Avatar asked Aug 24 '09 06:08

Bryan Lewis


People also ask

How do I count the number of files in Python?

Getting a count of files of a directory is easy as pie! Use the listdir() and isfile() functions of an os module to count the number of files of a directory.

How do you list all files in a directory with a certain extension in Python?

The method os. listdir() lists all the files present in a directory. We can make use of os. walk() if we want to work with sub-directories as well.

How do you check if a file has a certain extension Python?

to check a file with certain extension in python we use endswith method. The endswith() method returns True if a string ends with the specified suffix. If not, it returns False.

How do I count the number of files in a directory?

To determine how many files there are in the current directory, put in ls -1 | wc -l. This uses wc to do a count of the number of lines (-l) in the output of ls -1. It doesn't count dotfiles.


4 Answers

Something has to iterate over all files in the directory, and look at every single file name - whether that's your code or a library routine. So no matter what the specific solution, they will all have roughly the same cost.

If you think it's too much code, and if you don't actually need to search subdirectories recursively, you can use the glob module:

import glob
tifCounter = len(glob.glob1(myPath,"*.tif"))
like image 89
Martin v. Löwis Avatar answered Oct 17 '22 05:10

Martin v. Löwis


For this particular use case, if you don't want to recursively search in the subdirectory, you can use os.listdir:

len([f for f in os.listdir(myPath) 
     if f.endswith('.tif') and os.path.isfile(os.path.join(myPath, f))])
like image 27
tonfa Avatar answered Oct 17 '22 05:10

tonfa


Your code is fine.

Yes, you're going to need to loop over those files to filter out the .tif files, but looping over a small in-memory array is negligible compared to the work of scanning the file directory to find these files in the first place, which you have to do anyway.

I wouldn't worry about optimizing this code.

like image 5
Kenan Banks Avatar answered Oct 17 '22 07:10

Kenan Banks


If you do need to search recursively, or for some other reason don't want to use the glob module, you could use

file_count = sum(len(f for f in fs if f.lower().endswith('.tif')) for _, _, fs in os.walk(myPath))

This is the "Pythonic" way to adapt the example you found for your purposes. But it's not going to be significantly faster or more efficient than the loop you've been using; it's just a really compact syntax for more or less the same thing.

like image 5
David Z Avatar answered Oct 17 '22 06:10

David Z