Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to delete files from a directory tree whose names contain a certain string

I have a directory containing subdirectories from which I would like to delete any files whose names contain out. What is the fastest method of doing this?

I have tried several things.

A simple:

rm */*out*

Perl:

perl -e 'for ( <*/*out*> ) { ( (stat)[9] < (unlink) ) }'

Each of which seem to take a serious amount of time. For 1,000 subdirectories, each of which contain around 50 files matching *out*, it takes:

Perl:        ~25 mins
rm */*out* : ~18 mins

I also tried rsync, moving the files to a folder first and then syncing with delete, but that took ages.

Does anyone have a faster way of getting rid of these files, as this seems inordinately slow to me?

like image 889
abinitio Avatar asked Sep 12 '25 19:09

abinitio


1 Answers

I find test3 is the fastest (11-25 sec). But why not test it yourself?

Your filesystem can have a big impact on the performance.

The test uses GNU Parallel.

# Make test set: 150000 files, 50000 named *.seq
testset() {
  doit() { mkdir -p $1 ; cd $1 && parallel --results ./{} seq ::: {1..50}; }
  export -f doit
  seq 1000 | parallel --bar doit >/dev/null

  # Drop caches before starting a test
  echo 3 | sudo tee /proc/sys/vm/drop_caches >/dev/null
}
export -f testset

# Define tests
test1() {
  find . -name '*seq' | perl -ne 'chop;unlink'
}
export -f test1
test2() {
  find . -name '*seq' -delete
}
export -f test2
test3() {
  find . -name '*seq' | parallel --pipe -N1000 -q perl -ne 'chop;unlink'
}
export -f test3
test4() {
  find . -name '*seq' -print0 | xargs -0 -P2 rm
}
export -f test4
test5() {
  find . -name '*seq' -print0 | xargs -0 rm
}
export -f test5
test6() {
  find . -name '*seq' | perl -e 'chomp(@a=<>);unlink @a'
}
export -f test6
test7() {
  # sort by inode
  ls -U -i */*seq* | sort -k1,1 -n| cut -d' ' -f2- | perl -e 'chomp(@a=<>);unlink @a'
}
export -f test7

# Run testset/test? alternating
eval parallel --joblog jl -uj1 ::: testset' 'test{1..7} 
# sort by runtime
sort -nk4 jl
like image 79
Ole Tange Avatar answered Sep 14 '25 13:09

Ole Tange