Why does a String-casting loop seem to have a static overhead?

Question

Background

I've been running a code (posted at the bottom) to measure performance of explicit Java downcasting, and I've run into what I feel like is a bit of an anomaly... or perhaps two anomalies.

I have already looked at this thread on Java casting overhead, but it seemed to only talk about casting in general, rather than this particular phenomenon. This thread covered similar topics, and I don't really need advice on premature optimization--I'm tuning parts of my application for maximum performance, so this is the logical step.

The tests

I basically wanted to test the performance of downcasting versus the .toString() method for objects which were Strings, but were typed as Objects. So, I created a String a and an Object b with equivalent content, ran three loops, and timed them.

Loop 1 is ((String) b).toLowerCase();
Loop 2 is b.toString().toLowerCase();
and Loop 3 is a.toLowerCase().

Test results

(Measurements in milliseconds.)

   iters   |  Test Round  |  Loop 1  |  Loop 2  |  Loop 3
-----------|--------------|----------|----------|----------
50,000,000 |      1       |   3367   |   3166   |   3186
  Test A   |      2       |   3543   |   3158   |   3156
           |      3       |   3365   |   3155   |   3169
-----------|--------------|----------|----------|----------
 5,000,000 |      1       |    373   |    348   |    369
  Test B   |      2       |    373   |    348   |    370
           |      3       |    399   |    334   |    371
-----------|--------------|----------|----------|----------
  500,000  |      1       |    66    |    36    |    33
  Test C   |      2       |    71    |    36    |    41
           |      3       |    66    |    35    |    34
-----------|--------------|----------|----------|----------
  50,000   |      1       |    27    |     5    |     5
  Test D   |      2       |    27    |     6    |     5
           |      3       |    26    |     5    |     5
-----------|--------------|----------|----------|----------

Code used for tests

long t, iters = ...;

String a = "String", c;
Object b = "String";

t = System.currentTimeMillis();
for (int i = 0; i < iters; i++) {
    c = ((String) b).toLowerCase();
}
System.out.println(System.currentTimeMillis() - t);

t = System.currentTimeMillis();
for (int i = 0; i < iters; i++) {
    c = b.toString().toLowerCase();
}
System.out.println(System.currentTimeMillis() - t);

t = System.currentTimeMillis();
for (int i = 0; i < iters; i++) {
    c = a.toLowerCase();
}
System.out.println(System.currentTimeMillis() - t);

Finally, the question

What I find most fascinating is that loop 2 (.toString()) seemed to perform the best of the three (particularly in Test B)--which doesn't make intuitive sense. Why would calling .toString() be faster than already having a String object?

The other thing that bothers me is that it does not scale. If we compare Tests A and D, they are off by a factor of 9 when compared against each other (27 * 1000 = 27000, not 3000); why is there this huge discrepancy as the number of iterations scales?

Can anyone offer an explanation as to why these two anomalies have proven to be true?

The (weird) reality

Update: Upon advice of Bruno Reis's solution, I ran my benchmarks again with some compiler output. The first loop was crammed with initialization stuff, so I put in a "junk" loop to do this. Once that was done, the results were closer to expected.

Here is a full output from the console using 5,000,000 iterations (commented by me):

     50    1             java.lang.String::toLowerCase (472 bytes)
     50    2             java.lang.CharacterData::of (120 bytes)
     53    3             java.lang.CharacterDataLatin1::getProperties (11 bytes)
     53    4             java.lang.Character::toLowerCase (9 bytes)
     54    5             java.lang.CharacterDataLatin1::toLowerCase (39 bytes)
     67    6     n       java.lang.System::arraycopy (0 bytes)   (static)
     68    7             java.lang.Math::min (11 bytes)
     68    8             java.util.Arrays::copyOfRange (63 bytes)
     69    9             java.lang.String::toLowerCase (8 bytes)
     69   10             java.util.Locale::getDefault (13 bytes)
     70    1 %           Main::main @ 14 (175 bytes)
[GC 49088K->360K(188032K), 0.0007670 secs]
[GC 49448K->360K(188032K), 0.0024814 secs]
[GC 49448K->328K(188032K), 0.0005422 secs]
[GC 49416K->328K(237120K), 0.0007519 secs]
[GC 98504K->352K(237120K), 0.0122388 secs]
[GC 98528K->352K(327552K), 0.0005734 secs]
    595    1 %           Main::main @ -2 (175 bytes)   made not entrant
548 /****** Junk Loop ******/
    597    2 %           Main::main @ 61 (175 bytes)
[GC 196704K->356K(327552K), 0.0008460 secs]
[GC 196708K->388K(523968K), 0.0005100 secs]
343 /****** Loop 1 ******/
    939    2 %           Main::main @ -2 (175 bytes)   made not entrant
    940   11             java.lang.String::toString (2 bytes)
    940    3 %           Main::main @ 103 (175 bytes)
[GC 393092K->356K(523968K), 0.0036496 secs]
377 /****** Loop 2 ******/
   1316    3 %           Main::main @ -2 (175 bytes)   made not entrant
   1317    4 %           Main::main @ 145 (175 bytes)
[GC 393060K->332K(759680K), 0.0008326 secs]
320 /****** Loop 3 ******/

Bruno Reis · Accepted Answer

The benchmark is flawed, as most questions in SO and elsewhere related to benchmarking Java code. You are measuring much much much more things than you imagine, such as the JIT compiling methods, HotSpot optimizing loops, etc.

Check http://www.ibm.com/developerworks/java/library/j-jtp02225/index.html .

Also, server VM and client VM behave differently (the JVM starts up faster in client but runs slower for some time, because it starts interpreting the bytecode while it gets compiled, while the server VM compiles it before running), etc.

The GC might also be interfering, more so if you get any Full GC during your benchmark (usually Full GCs completely pause every other thread). Even minor collections might have some impact, since they can use quite a bit of CPU to clean up a possibly huge mess created inside the loops.

To do a proper benchmark you should "warm up" the JVM, turn on lots of output from the JVM to be sure about what you are measuring, etc.

Check this question here on SO which addresses how to write benchmarks in Java, including the topics I mentioned above and much more in great detail: How do I write a correct micro-benchmark in Java?

Stephen C · Answer

Why would calling .toString() be faster than already having a String object?

Looking at the numbers, I don't see that Loop 2 is consistently faster than Loop3 at all. Indeed, in some cases it is slower. The apparent significant difference in Test B could simply be that the GC runs once more in the Loop 3 case than the Loop 2 case. This could simply be an artefact of the benchmark design.

Anyway, if you really want to find out what (if anything) is going on, you need to look at the native instructions that are generated by the JIT compiler in each case. (There are JVM options to do this ...)

Why does a String-casting loop seem to have a static overhead?

Tags:

java

performance

casting

Background

The tests

Test results

Code used for tests

Finally, the question

The (weird) reality

Cat

2 Answers

Bruno Reis

Stephen C

Recent Activity

Donate For Us

Why does a String-casting loop seem to have a static overhead?

Tags:

java

performance

casting

Background

The tests

Test results

Code used for tests

Finally, the question

The (weird) reality

Cat

2 Answers

Bruno Reis

Stephen C

Related questions

Recent Activity

Donate For Us