Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java File IO performance dropped over 30% when adding lots of JARs into classpath

Tags:

java

file

io

Test Code:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map.Entry;

public class ReadLine {

    /**
     * @param args
     * @throws IOException
     */
    public static void main(String[] args) throws IOException {
        System.getenv();
        System.getProperties();

        BufferedReader br = new BufferedReader(new FileReader(args[0]), 2 << 17);
        int lineTotal = 0;
        int lineDone = 0;
        long start = System.currentTimeMillis();
        long totalSincePre = System.currentTimeMillis();
        while (br.readLine() != null) {
            lineTotal++;
            if (lineTotal % 100000 == 0) {
                long end = System.currentTimeMillis();
                System.out.println("total speed=" + lineTotal / (end - totalSincePre) + "k/s. curr speed="
                        + (lineTotal - lineDone) / (end - start));
                start = end;
                lineDone = lineTotal;
            }
        }
        printEnv();
    }

    static void printEnv() {

        for (Entry<?, ?> e : System.getenv().entrySet()) {
            System.out.println(e.getKey() + ":" + e.getValue());
        }

        for (Entry<?, ?> e : System.getProperties().entrySet()) {
            System.out.println(e.getKey() + ":" + e.getValue());
        }
    }
}

Test Env: OS: Linux, File Size: 7.2G (csv text file, Over 1k per line), java version "1.6.0_32"

Test Run:

The code is packaged into one jar. I ran following two test on the same host, using the same jar, reading the same file.

1) Run ./java -cp=my.jar ReadLine TestFile.txt The performance stabilized at about 150k line /s.

Console output is like:

total speed=251k/s. curr speed=251
total speed=304k/s. curr speed=384
total speed=323k/s. curr speed=371
total speed=337k/s. curr speed=387
total speed=350k/s. curr speed=414
total speed=358k/s. curr speed=401
total speed=363k/s. curr speed=395
total speed=349k/s. curr speed=277
total speed=304k/s. curr speed=150
total speed=277k/s. curr speed=153
total speed=258k/s. curr speed=154
total speed=244k/s. curr speed=152
total speed=233k/s. curr speed=152
total speed=225k/s. curr speed=154
total speed=218k/s. curr speed=153
total speed=196k/s. curr speed=149
total speed=193k/s. curr speed=146
......stabled.......
total speed=163k/s. curr speed=150
total speed=162k/s. curr speed=155
total speed=162k/s. curr speed=155
total speed=162k/s. curr speed=149
total speed=162k/s. curr speed=147
total speed=162k/s. curr speed=150
total speed=161k/s. curr speed=148
total speed=161k/s. curr speed=145
total speed=161k/s. curr speed=151
total speed=161k/s. curr speed=154
total speed=161k/s. curr speed=163
total speed=161k/s. curr speed=179

2) No code changes, just added JARs (used in production environment) into classpath (needed in production, but not it this program), like ./java -cp=my.jar:hundreds_of_other_jars ReadLine TestFile.txt. The performance dropped to about 90k line /s

total speed=312k/s. curr speed=383
total speed=335k/s. curr speed=393
total speed=348k/s. curr speed=395
total speed=361k/s. curr speed=423
total speed=369k/s. curr speed=414
total speed=374k/s. curr speed=404
total speed=342k/s. curr speed=214
total speed=264k/s. curr speed=93
total speed=224k/s. curr speed=95
total speed=200k/s. curr speed=95
total speed=182k/s. curr speed=94
total speed=170k/s. curr speed=94
total speed=161k/s. curr speed=95
total speed=154k/s. curr speed=95
total speed=148k/s. curr speed=93
.....stabled.....
total speed=139k/s. curr speed=92
total speed=135k/s. curr speed=92
total speed=132k/s. curr speed=92
total speed=129k/s. curr speed=92
total speed=127k/s. curr speed=92
total speed=125k/s. curr speed=90
total speed=123k/s. curr speed=91
total speed=121k/s. curr speed=92
total speed=120k/s. curr speed=89
total speed=118k/s. curr speed=92
total speed=117k/s. curr speed=91
total speed=116k/s. curr speed=91
total speed=115k/s. curr speed=91
total speed=114k/s. curr speed=90
total speed=113k/s. curr speed=91

My analysis:

The ONLY difference is classpath. The second test's classpath has hundreds of JARs in classpath. But none of them is used in this program.

  • Not environment related. This code is packaged into the JAR and both tests used the same JAR. Both tests run on the same host, read the same file, use the same code. I also compared the System.getEnv and System.getProperties, no difference other than classpath.
  • Not OS cache. This can be reproduced. After many tests, the results are the same. No matter which test ran first.
  • jmap shows the memory usage has no big difference and all generations are not highly used.
  • jstack shows the call stack for both test is most likely to be as

this

at java.io.FileInputStream.available(Native Method)
at sun.nio.cs.StreamDecoder.inReady(StreamDecoder.java:343)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:304)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
 - locked <0xb4220388> (a java.io.FileReader)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
- locked <0xb4220388> (a java.io.FileReader)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.amazon.invhealth.metrics.transform.topasin.RL.main(RL.java:24)

or

at sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(UTF_8.java:240)
at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:305)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:544)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:298)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)
- locked <0xb4220388> (a java.io.FileReader)
at java.io.InputStreamReader.read(InputStreamReader.java:167)
at java.io.BufferedReader.fill(BufferedReader.java:136)
at java.io.BufferedReader.readLine(BufferedReader.java:299)
- locked <0xb4220388> (a java.io.FileReader)
at java.io.BufferedReader.readLine(BufferedReader.java:362)
at com.amazon.invhealth.metrics.transform.topasin.RL.main(RL.java:24)

Looking at the call stack these test cases are likely using the same code.

  • This is not caused by a certain JAR in the classpath. I tried to remove the first 50% of the classpath to run the test, the performance is about 110k line/s. Then I remove the last 50% to run the test, the performance is also about 110k line/s. If remove over 2/3 jars from classpath, the performance is about 120k line/s. So from the test, this performance issue just related to how many JARs are in the classpath.
  • Then I tried to package all these JARs into one big JAR. Sadly the performance dropped from 90k to 60k.... So to be precise, based on my test, this performance drop is caused by how many classes are in the classpath.
  • I ran the two tests on a different machine with a different file (file size and format are similar), the results are the same. So this can definitely be reproduced.

BUT I think this doesn't make sense. Did I miss anything? If this indeed is true what is the root cause?

----------More Debug-----------------

GC and Perm Size

Added -Xmx2432m -Xms256m -XX:MaxNewSize=700m -XX:MaxPermSize=1024m -XX:+PrintGCDetails to both tests. There are all just have PSYoungGen GC. And the performance is the same for both.

Output Details

Long CP one:

total speed=114k/s. curr speed=91
[GC [PSYoungGen: 247888K->16K(238272K)] 248810K->938K(413056K), 0.0003290 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=113k/s. curr speed=92
[GC [PSYoungGen: 238096K->16K(228864K)] 239018K->938K(403648K), 0.0003840 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=113k/s. curr speed=92
[GC [PSYoungGen: 228816K->16K(220096K)] 229738K->938K(394880K), 0.0006030 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=112k/s. curr speed=92
[GC [PSYoungGen: 219984K->16K(211584K)] 220906K->938K(386368K), 0.0004380 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=111k/s. curr speed=93
[GC [PSYoungGen: 211536K->16K(203584K)] 212458K->938K(378368K), 0.0005160 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=111k/s. curr speed=92
[GC [PSYoungGen: 203472K->16K(195840K)] 204394K->938K(370624K), 0.0005920 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=110k/s. curr speed=94
[GC [PSYoungGen: 195792K->16K(188608K)] 196714K->938K(363392K), 0.0004010 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 188496K->16K(181568K)] 189418K->938K(356352K), 0.0004440 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 

......................

Heap
 PSYoungGen      total 145984K, used 81767K [0xc8560000, 0xd7780000, 0xf4160000)
  eden space 145920K, 56% used [0xc8560000,0xcd535d18,0xd13e0000)
  from space 64K, 25% used [0xd7760000,0xd7764000,0xd7770000)
  to   space 64K, 0% used [0xd7770000,0xd7770000,0xd7780000)
 PSOldGen        total 174784K, used 922K [0x5c160000, 0x66c10000, 0xc8560000)
  object space 174784K, 0% used [0x5c160000,0x5c246ae8,0x66c10000)
 PSPermGen       total 16384K, used 2032K [0x1c160000, 0x1d160000, 0x5c160000)
  object space 16384K, 12% used [0x1c160000,0x1c35c260,0x1d160000)

One Jar CP:

total speed=180k/s. curr speed=148
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005300 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0004950 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005020 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=179k/s. curr speed=150
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005360 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005190 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=178k/s. curr speed=151
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005360 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0005400 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
[GC [PSYoungGen: 87248K->16K(87296K)] 87904K->672K(262080K), 0.0003510 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
total speed=177k/s. curr speed=150

Heap
 PSYoungGen      total 87296K, used 83826K [0xc8580000, 0xcdad0000, 0xf4180000)
  eden space 87232K, 96% used [0xc8580000,0xcd758928,0xcdab0000)
  from space 64K, 25% used [0xcdab0000,0xcdab4000,0xcdac0000)
  to   space 64K, 0% used [0xcdac0000,0xcdac0000,0xcdad0000)
 PSOldGen        total 174784K, used 656K [0x5c180000, 0x66c30000, 0xc8580000)
  object space 174784K, 0% used [0x5c180000,0x5c224080,0x66c30000)
 PSPermGen       total 16384K, used 2022K [0x1c180000, 0x1d180000, 0x5c180000)
  object space 16384K, 12% used [0x1c180000,0x1c379bb0,0x1d180000)

JVM Initial

I am very expecting this is the cause because this is reasonable. But after using the following code:

String filepath = args[0];
while (true) {
    BufferedReader br = new BufferedReader(new FileReader(filepath), 2 << 17);
    System.out.println("Press Enter to start...");
    while (System.in.read() != '\n')
        ;
    int lineTotal = 0;
    int linePre = 0;
    long start = System.currentTimeMillis();
    long totalStart = System.currentTimeMillis();
    while (br.readLine() != null) {
        lineTotal++;
        if (lineTotal % 100000 == 0) {
            long end = System.currentTimeMillis();
            System.out.println("total speed=" + lineTotal / (end - totalStart) + "k/s. curr speed="
                    + (lineTotal - linePre) / (end - start));
            start = end;
            linePre = lineTotal;
        }
    }
}

Run and press enter main times, the performance doesn't change.

like image 409
DeepNightTwo Avatar asked Apr 17 '13 06:04

DeepNightTwo


People also ask

How do I resolve a classpath problem?

You have 3 solutions: add this class in the path of your other compiled classes (respecting the package naming of your directories) add the root directory of this class in your classpath (in your case "C:\java\project\") add this single class into a jar and add this jar to the classpath.

How do you specify classpath in manifest MF?

By using the Class-Path header in the manifest, you can avoid having to specify a long -classpath flag when invoking Java to run your application. Note: The Class-Path header points to classes or JAR files on the local network, not JAR files within the JAR file or classes accessible over Internet protocols.


1 Answers

I found the solution but still don't know why. I run the test again using JDK 1.7.0_21, the performance of two test cases are the same, stabled at 180k line / second.

As mentioned in the question, another test is using JDK 1.6.0_37, it has performance improve but still has the gap.

So, upgrade JDK if possible.

I will be very appreciated if someone knows the root cause and tell me why :-)

like image 97
DeepNightTwo Avatar answered Oct 07 '22 06:10

DeepNightTwo