I have a java program that uses ProcessBuilder to call the unix sort command. When I run this code within my IDE (intelliJ) it only takes about a second to sort 500,000 lines. When I package it into an executable jar, and run that from the terminal it takes about 10 seconds. When I run the sort command myself from the terminal, it takes 20 seconds!
Why the vast difference in performance and any way I can get the jar to execute with the same performance? Environment is OSX 10.6.8 and java 1.6.0_26. The bottom of the sort man page says "sort 5.93 November 2004"
The command it is executing is:
sort -t' ' -k5,5f -k4,4f -k1,1n /path/to/imput/file -o /path/to/output/file
Note that when I run sort from the terminal I need to manually escape the tab delimiter and use the argument -t$'\t'
instead of the actual tab (which I can pass to ProcessBuilder).
Looking as ps
everything seems the same except when run from IDE the sort command has a TTY of ?? instead of ttys000--but from this question I don't think that should make a difference. Perhaps BASH is slowing me down? I am running out of ideas and want to close this 20x performance gap!
The sort command is used in Linux to print the output of a file in given order. This command processes on your data (the content of the file or output of any command) and reorders it in the specified way, which helps us to read the data efficiently.
To sort by number pass the -n option to sort . This will sort from lowest number to highest number and write the result to standard output.
SORT command is used to sort a file, arranging the records in a particular order. By default, the sort command sorts file assuming the contents are ASCII. Using options in the sort command can also be used to sort numerically. SORT command sorts the contents of a text file, line by line.
11. Sorting numbers is extremely simple on Unix systems; just use the -n option with your sort commands.
I'm going to venture two guesses:
perhaps you are invoking different versions of sort (do a which sort
and use the full absolute path to recompare?)
perhaps you are using more complicated locale settings (leading to more complicated character set handling etc.)? Try
export LANG=C
sort -t' ' -k5,5f -k4,4f -k1,1n /input/file -o /output/file
to compare
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With