Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Running sed in parallel

I naively ventured to use the following command to process a data file:

cat old.one | parallel --pipe 'sed -r "s/\./\,/g"' > new.one

The goal was to replace "." with ",". But the resulting file differs from that obtained by sequential treatment:

sed -r "s/\./\,/g" old.one > new.one

Maybe parallel work can be done somehow differently? Here it would be great to do without the semaphores, and combine the parts only at the end.

Solution

Thanks a lot! Here is my results:

  • sed: 13.834 s

    sed -r "s/./\,/g" old.one > new.one

  • parallel sed: 12.489 s

    cat old.one | parallel -k --pipe 'sed -r "s/./\,/g"' > new.one

  • tr: 6.480 s

    cat old.one | tr "." "," > new.one

  • parallel tr: 5.848 s

    cat new.one | parallel -k --pipe tr "." "," > old.one

like image 476
Stanislav Fyodorov Avatar asked Apr 19 '16 18:04

Stanislav Fyodorov


1 Answers

If this works correctly (-j1):

cat old.one | parallel -j1 --pipe 'sed -r "s/\./\,/g"' > new.one

then this should work (-k):

cat old.one | parallel -k --pipe 'sed -r "s/\./\,/g"' > new.one

--pipe is very slow, so if speed is of the essence, use --pipe-part instead with a decent block size:

parallel -a old.one -k --block 30M --pipe-part 'sed -r "s/\./\,/g"' > new.one
like image 163
Ole Tange Avatar answered Sep 30 '22 13:09

Ole Tange