Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why using pipe for sort (linux command) is slow?

Tags:

I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried

Method 1

awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted

Method 2

awk '...' myBigFile > myBigFile.tmp
sort --parallel 56 myBigFile.tmp > myBigFile.sorted

Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled?

EDIT

awk and myBigFile is not important, this experiment is repeatable by simply using seq 1 10000000 | sort --parallel 56 (thanks to @Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.