What is the fastest / easiest way to count large number of files in a directory (in Linux)?

Tags:

I had some directory, with large number of files. Every time I tried to access the list of files within it, I was not able to do that or there was significant delay. I was trying to use ls command within command-line on Linux and web interface from my hosting provider did not help also.

The problem is, that when I just do ls, it takes significant amount of time to even start displaying something. Thus, ls | wc -l would not help also.

After some research I came up with this code (in this example it counts number of new emails on some server):

print sum([len(files) for (root, dirs, files) in walk('/home/myname/Maildir/new')])

The above code is written in Python. I used Python's command-line tool and it worked pretty fast (returned result instantly).

I am interested in the answer to the following question: is it possible to count files in a directory (without subdirectories) faster? What is the fastest way to do that?

751

asked May 21 '11 16:05

Tadeck

3 Answers

I'm not sure about speed, but if you want to just use shell builtins this should work:

#!/bin/sh
COUNT=0;
for file in /path/to/directory/*
do
COUNT=$(($COUNT+1));
done
echo $COUNT

118

answered Oct 06 '22 00:10

Shea Levy

Total number of files in the given directory

find . -maxdepth 1 -type f | wc -l

Total number of files in the given directory and all subdirectories under it

find . -type f | wc -l

For more details drop into a terminal and do man find

answered Oct 06 '22 01:10

Praveen Lobo

ls does a stat(2) call for every file. Other tools, like find(1) and the shell wildcard expansion, may avoid this call and just do readdir. One shell command combination that might work is find dir -maxdepth 1|wc -l, but it will gladly list the directory itself and miscount any filename with a newline in it.

From Python, the straight forward way to get just these names is os.listdir(directory). Unlike os.walk and os.path.walk, it does not need to recurse, check file types, or make further Python function calls.

Addendum: It seems ls doesn't always stat. At least on my GNU system, it can do only a getdents call when further information (such as which names are directories) is not requested. getdents is the underlying system call used to implement readdir in GNU/Linux.

Addition 2: One reason for a delay before ls outputs results is that it sorts and tabulates. ls -U1 may avoid this.

answered Oct 06 '22 01:10

Yann Vernier

Related questions
                            
                                Docker how to make python 3.8 as default
                            
                                Making a virtual package available via sys.modules
                            
                                Python list serialization - fastest method
                            
                                Checking folder/file ntfs permissions using python
                            
                                How do I forbid easy_install from zipping eggs?
                            
                                Show *only* docstring in Sphinx documentation?
                            
                                matplotlib for R user? [closed]
                            
                                How to debug a MemoryError in Python? Tools for tracking memory use?
                            
                                Can a Python package depend on a specific version control revision of another Python package?
                            
                                Django - Store unescaped html in model
                            
                                How do I convert a string to a buffer in Python 3.1?
                            
                                Silence loggers and printing to screen - Python
                            
                                How should I go about learning Python? [closed]
                            
                                Where do things go when I ‘print’ them from my Django app?
                            
                                How deploy Flask application on Webfaction?
                            
                                Finding the correlation matrix
                            
                                Bandwidth throttling in Python
                            
                                Why are sets bigger than lists in python?
                            
                                Python: find a duplicate in a container efficiently
                            
                                Python equivalent of Perl file test readable (-r), writeable (-w) and executable (-x) operators

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the fastest / easiest way to count large number of files in a directory (in Linux)?

Tags:

python

linux

directory-listing

ls