Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to accelerate substitution when using GNU sed with GNU find?

I have the results of a numerical simulation that consist of hundreds of directories; each directory contains millions of text files.

I need to substitute a the string "wavelength;" with "wavelength_bc;" so I have tried both the following:

find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {} \;

and

find . -type f -exec sed -i 's/wavelength;/wavelength_bc;/g' {} +

Unfortunately, the commands above take a very long time to finish, (more than 1 hour).

I wonder how can I take advantage of the number of cores on my machine (8) to accelerate the command above?

I am thinking of using xargs with -P flag. I'm scared that that will corrupt the files; so I have no idea if that is safe or not?

In summary:

  • How can I accelerate sed substitutions when using with find?
  • Is it safe to uses xargs -P to run that in parallel?

Thank you

like image 245
Iyach tharwa nambarek Avatar asked Mar 02 '23 10:03

Iyach tharwa nambarek


1 Answers

xargs -P should be safe to use, however you will need to use -print0 option of find and piping to xargs -0 to address filenames with spaces or wildcards:

find . -type f -print0 |
xargs -0 -I {} -P 0 sed -i 's/wavelength;/wavelength_bc;/g' {}

-P 0 option in xargs will run in Parallel mode. It will run as many processes as possible for your CPU.

like image 58
anubhava Avatar answered Mar 05 '23 16:03

anubhava