Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

linux find on multiple patterns

Tags:

linux

bash

I need to do a find on roughly 1500 file names and was wondering if there is a way to execute simultaneous find commands at the same time.

Right now I do something like

for fil in $(cat my_file)
do
  find . -name $fil >> outputfile
done

is there a way to spawn multiple instances of find to speed up the process. Right now it takes about 7 hours to run this loop one file at a time.

like image 596
mike Avatar asked Oct 16 '25 14:10

mike


1 Answers

Given the 7-hour runtime you mention, I presume the file system has some millions of files in it so that OS disk buffers loaded in one query are being reused before the next query begins. You can test this hypothesis by timing the same find a few times, as in following example.

tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx   1 omg omg  9732338 Aug  1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x   1 omg omg  5144339 Apr 22  2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x   1 omg omg  2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG

real    0m15.823s
user    0m0.908s
sys 0m1.608s

tini ~ > time find . -name IMG_0772.JPG -ls
25430459 9504 lrwxrwxrwx   1 omg omg  9732338 Aug  1 01:33 ./pix/rainbow/IMG_0772.JPG
20341373 5024 -rwxr-xr-x   1 omg omg  5144339 Apr 22  2009 ./pc/2009-04/IMG_0772.JPG
22678808 2848 -rwxr-xr-x   1 omg omg  2916237 Jul 21 21:03 ./pc/2012-07/IMG_0772.JPG

real    0m0.715s
user    0m0.340s
sys 0m0.368s

In the example, the second find ran much faster because the OS still had buffers in RAM from the first find. [On my small Linux 3.2.0-32 system, according to top at the moment 2.5GB of RAM is buffers, 0.3GB is free, and 3.8GB in use (ie about 1.3GB for programs and OS).]

Anyhow, to speed up processing, you need to find a way to make better use of OS disk buffering. For example, double or quadruple your system memory. For an alternative, try the locate command. The query
time locate IMG_0772.JPG
consistently takes under a second on my system. You may wish to run updatedb just before starting the job that finds the 1500 file names. See man updatedb. If directory . in your find's gives only a small part of the overall file system, so that the locate database includes numerous irrelevant files, use various prune options when you run updatedb, to minimize the size of the locate database that is accessed when you run locate; and afterwards, run a plain updatedb to restore other filenames to the locate database. Using locate you probably can cut the run time to 20 minutes.

like image 140
James Waldby - jwpat7 Avatar answered Oct 18 '25 07:10

James Waldby - jwpat7



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!