We have an application that imports a large amount of files by splitting the data and sorting it. When running the JUnit test case, the whole process takes about 16 minutes.
Same test, done with mvn clean test -Dtest=MyTest
run in 34 minutes.
We are calling in to /bin/sort
to sort the files. The sort seems to be taking longer. I don't understand what is different.
Looking at IntelliJ it runs with
/Library/Java/JavaVirtualMachines/1.6.0_26-b03-383.jdk/Contents/Home/bin/java -Didea.launcher.port=7532 -Didea.launcher.bin.path=/Applications/IntelliJ IDEA 10.app/bin -Dfile.encoding=UTF-8 -classpath %classhpath% com.intellij.rt.execution.application.AppMain com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 xxx.IntTestImportProcess,testImportProcess
I am on OS X. All the classes are injected using Spring. What are some possible suggestions are theories at what is behind this performance gain in IntelliJ? The tests are identical. I can't share all of the code because there is just so much. But I can add any detail if requested.
Here is my main class and how I am running both.
public static void main(String... args) throws IOException {
if(args.length != 2) {
System.out.println("Usage: \n java -jar client.jar spring.xml data_file");
System.exit(1);
}
ApplicationContext applicationContext = new FileSystemXmlApplicationContext(args[0]);
PeriodFormatter formatter = new PeriodFormatterBuilder()
.appendMinutes()
.appendSuffix("minute", "minutes")
.appendSeparator(" and ")
.appendSeconds()
.appendSuffix("second", "seconds")
.toFormatter();
URI output = (URI) applicationContext.getBean("workingDirectory");
File dir = new File(output);
if(dir.exists()) {
Files.deleteDirectoryContents(dir.getCanonicalFile());
}
else {
dir.mkdirs();
}
ImportProcess importProcess = applicationContext.getBean(ImportProcess.class);
long start = System.currentTimeMillis();
File file = new File(args[1]);
importProcess.beginImport(file);
Period period = new Period(System.currentTimeMillis() - start); // in milliseconds
System.out.println(formatter.print(period.toPeriod()));
}
I have decided to remove JUnit and just use a main() method. The result are exactly the same. IntelliJ is again. Here is the crazy log.
With IntelliJ
DEBUG [ main] 2011-08-18 13:05:16,259 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/usage]
DEBUG [ main] 2011-08-18 13:06:09,546 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/customer]
With java -jar
DEBUG [ main] 2011-08-18 12:10:16,726 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/usage]
DEBUG [ main] 2011-08-18 12:15:55,893 [er.DelimitedTextUnixDataSorter] Sorting file [/Users/amirraminfar/Desktop/import-process/customer]
The sort command is
sort -t' ' -f -k32,32f -k18,18f -k1,1n
As you can see above, sorting in Intellij take 1 minutes but in java -jar takes 5 minutes!
Update
I ran everything using /Library/Java/JavaVirtualMachines/1.6.0_26-b03-383.jdk/Contents/Home/bin/java
and the sorting still takes well over 5+ mins.
From the main menu, select Run | Edit Configurations or choose Edit Configurations from the run/debug configurations selector on the toolbar. In the Run/Debug Configurations dialog that opens, select a configuration where you want to pass the arguments. Type the arguments in the Program arguments field.
Configure JVM options From the main menu, select Help | Edit Custom VM Options. If you do not have any project open, on the Welcome screen, click Configure and then Edit Custom VM Options. If you cannot start IntelliJ IDEA, manually copy the default file with JVM options to the IntelliJ IDEA configuration directory.
Thank you everybody for helping. It turns out IntelliJ starts sort with LANG=C
. Mac OS X terminal sorts by default in UTF8
which explains the performance loss. Hopefully this answer will help somebody.
Is mvn clean
doing a rebuild of the project? Is the run under IDEA not doing that? Does building the project with Maven take 18 minutes (I wouldn't be surprised if it did, given that Maven is the absolute pits)?
If the answers to all these questions are 'yes', then i think you have a conclusion.
The solution is to take Maven to the woods, shoot it, then bury it in an unmarked grave.
A guess more than substantiated answer:
A lot may depend on I/O buffering. Sort over 500K records is going to output a lot of data, so the right buffer size may matter a lot. I think the tty is typically line buffered, so it is going to do 500K read & write ops, and the IDE may simply read in much larger buffers.
Additionally, it is possible that OSX has process or I/O scheduling which heavily favours GUI apps over console ones (which could be detected through being bound to a tty), so it might be that you have to wait & idle a lot more time from the console than from within the IDE.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With