Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which is faster, 'find -exec' or 'find | xargs -0'?

In my web application I render pages using PHP script, and then generate static HTML files from them. The static HTML are served to the users to speed up performance. The HTML files become stale eventually, and need to be deleted.

I am debating between two ways to write the eviction script.

The first is using a single find command, like

find /var/www/cache -type f -mmin +10 -exec rm \{} \;

The second form is by piping through xargs, something like

find /var/www/cache -type f -mmin +10 -print0 | xargs -0 rm

The first form invokes rm for each file it finds, while the second form just sends all the file names to a single rm (but the file list might be very long).

Which form would be faster?

In my case, the cache directory is shared between a few web servers, so this is all done over NFS, if that matters for this issue.

like image 380
yhager Avatar asked Jun 11 '09 10:06

yhager


People also ask

What is difference between scanning and skimming?

Skimming is reading rapidly in order to get a general overview of the material. Scanning is reading rapidly in order to find specific facts. While skimming tells you what general information is within a section, scanning helps you locate a particular fact.

What is a fast swing speed in golf?

What is the average golf head speed? The average clubhead speed for many male, amateur golfers is between 80-90 mph. Leading LPGA players come in around 90-100 mph. Tour pros tend to have average golf swing speeds in the 110-115 mph range or even higher, and long drive competitors are all the way up in the 140s.

What is skimming reading with example?

Skimming often refers to the way in which one reads at a faster rate to gain the general idea about the text without paying heed to the intentional and detailed meaning of the text. For Example - When one reads the text only in order to understand the thesis statement, in one or two lines.


3 Answers

The xargs version is dramatically faster with a lot of files than the -exec version as you posted it, this is because rm is executed once for each file you want to remove, while xargs will lump as many files as possible together into a single rm command.

With tens or hundreds of thousands of files, it can be the difference between a minute or less versus the better part of an hour.

You can get the same behavior with -exec by finishing the command with a "+" instead of "\;". This option is only available in newer versions of find.

The following two are roughly equivalent:

find . -print0 | xargs -0 rm
find . -exec rm \{} +

Note that the xargs version will still run slightly faster (by a few percent) on a multi-processor system, because some of the work can be parallelized. This is particularly true if a lot of computation is involved.

like image 93
tylerl Avatar answered Sep 30 '22 21:09

tylerl


I expect the xargs version to be slightly faster as you aren't spawning a process for each filename. But, I would be surprised if there was actually much difference in practice. If you're worried about the long list xargs sends to each invocation of rm, you can use -l with xargs to limit the number of tokens it will use. However, xargs knows the longest cmdline length and won't go beyond that.

like image 25
kbyrd Avatar answered Sep 30 '22 19:09

kbyrd


The find command has a -delete option builtin in, perhaps that could be useful as well? http://lists.freebsd.org/pipermail/freebsd-questions/2004-July/051768.html

like image 33
natevw Avatar answered Sep 30 '22 21:09

natevw