I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried Method 1 <pre class="prettyprint"><code>awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted </code></pre> Method 2 <pre class="prettyprint"><code>awk '...' myBigFile > myBigFile.tmp sort --parallel 56 myBigFile.tmp > myBigFile.sorted </code></pre> Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled? EDIT <code>awk</code> and <code>myBigFile</code> is not important, this experiment is repeatable by simply using <code>seq 1 10000000 | sort --parallel 56</code> (thanks to @Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.

Why using pipe for sort (linux command) is slow?

Tags:

I have a large text file of ~8GB which I need to do some simple filtering and then sort all the rows. I am on a 28-core machine with SSD and 128GB RAM. I have tried

Method 1

Click to copy

awk '...' myBigFile | sort --parallel = 56 > myBigFile.sorted

Method 2

Click to copy

awk '...' myBigFile > myBigFile.tmp
sort --parallel 56 myBigFile.tmp > myBigFile.sorted

Surprisingly, method1 takes 11.5 min while method2 only takes (0.75 + 1 < 2) min. Why is sorting so slow when piped? Is it not paralleled?

EDIT

awk and myBigFile is not important, this experiment is repeatable by simply using seq 1 10000000 | sort --parallel 56 (thanks to @Sergei Kurenkov), and I also observed a six-fold speed improvement using un-piped version on my machine.

Related questions
                            
                                How can I pass a Perl 6 object through a Nativecall callback?
                            
                                How to abort loading component in Loader?
                            
                                Something keeps rewriting my Gemfile.lock [duplicate]
                            
                                Keras Realtime Augmentation adding Noise and Contrast
                            
                                Database location in Microservices Architecture
                            
                                Passing array query parameters with API Gateway to lambda
                            
                                Add multiple columns to DataFrame and set them equal to an existing column
                            
                                Vue.js assets file path when using v-for
                            
                                Angular 2 / leaflet map, How to link to a component from marker popup ? ... routerLink?
                            
                                @Value("${local.server.port}") not working in Spring boot 1.5
                            
                                RxJS: Is there an no-op observable?
                            
                                Getting response header

Why using pipe for sort (linux command) is slow?

Tags:

Recent Activity

Donate For Us