Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unix sort command takes much longer depending on where it is executed?! (fastest from ProcessBuilder in program run from IDE, slowest from terminal)

I have a java program that uses ProcessBuilder to call the unix sort command. When I run this code within my IDE (intelliJ) it only takes about a second to sort 500,000 lines. When I package it into an executable jar, and run that from the terminal it takes about 10 seconds. When I run the sort command myself from the terminal, it takes 20 seconds!

Why the vast difference in performance and any way I can get the jar to execute with the same performance? Environment is OSX 10.6.8 and java 1.6.0_26. The bottom of the sort man page says "sort 5.93 November 2004"

The command it is executing is:

sort -t'    ' -k5,5f -k4,4f -k1,1n /path/to/imput/file -o /path/to/output/file

Note that when I run sort from the terminal I need to manually escape the tab delimiter and use the argument -t$'\t' instead of the actual tab (which I can pass to ProcessBuilder).

Looking as ps everything seems the same except when run from IDE the sort command has a TTY of ?? instead of ttys000--but from this question I don't think that should make a difference. Perhaps BASH is slowing me down? I am running out of ideas and want to close this 20x performance gap!

like image 453
Aaron Silverman Avatar asked Aug 19 '11 16:08

Aaron Silverman


People also ask

How does sort command work in Unix?

The sort command is used in Linux to print the output of a file in given order. This command processes on your data (the content of the file or output of any command) and reorders it in the specified way, which helps us to read the data efficiently.

How do you sort from lowest to highest in Linux?

To sort by number pass the -n option to sort . This will sort from lowest number to highest number and write the result to standard output.

Which Linux command is used to sort the?

SORT command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts file assuming the contents are ASCII. Using options in the sort command can also be used to sort numerically. SORT command sorts the contents of a text file, line by line.

How do you sort numerically in Unix?

11. Sorting numbers is extremely simple on Unix systems; just use the -n option with your sort commands.


1 Answers

I'm going to venture two guesses:

  • perhaps you are invoking different versions of sort (do a which sort and use the full absolute path to recompare?)

  • perhaps you are using more complicated locale settings (leading to more complicated character set handling etc.)? Try

     export LANG=C
     sort -t'    ' -k5,5f -k4,4f -k1,1n /input/file -o /output/file
    

to compare

like image 84
sehe Avatar answered Sep 27 '22 02:09

sehe