Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java benchmarking - why is the second loop faster?

I'm curious about this.

I wanted to check which function was faster, so I create a little code and I executed a lot of times.

public static void main(String[] args) {          long ts;         String c = "sgfrt34tdfg34";          ts = System.currentTimeMillis();         for (int k = 0; k < 10000000; k++) {             c.getBytes();         }         System.out.println("t1->" + (System.currentTimeMillis() - ts));          ts = System.currentTimeMillis();         for (int i = 0; i < 10000000; i++) {             Bytes.toBytes(c);         }         System.out.println("t2->" + (System.currentTimeMillis() - ts));      } 

The "second" loop is faster, so, I thought that Bytes class from hadoop was faster than the function from String class. Then, I changed the order of the loops and then c.getBytes() got faster. I executed many times, and my conclusion was, I don't know why, but something happen in my VM after the first code execute so that the results become faster for the second loop.

like image 791
Guille Avatar asked Dec 18 '13 10:12

Guille


2 Answers

This is a classic java benchmarking issue. Hotspot/JIT/etc will compile your code as you use it, so it gets faster during the run.

Run around the loop at least 3000 times (10000 on a server or on 64 bit) first - then do your measurements.

like image 85
Tim B Avatar answered Sep 22 '22 22:09

Tim B


You know there's something wrong, because Bytes.toBytes calls c.getBytes internally:

public static byte[] toBytes(String s) {     try {         return s.getBytes(HConstants.UTF8_ENCODING);     } catch (UnsupportedEncodingException e) {         LOG.error("UTF-8 not supported?", e);         return null;     } } 

The source is taken from here. This tells you that the call cannot possibly be faster than the direct call - at the very best (i.e. if it gets inlined) it would have the same timing. Generally, though, you'd expect it to be a little slower, because of the small overhead in calling a function.

This is the classic problem with micro-benchmarking in interpreted, garbage-collected environments with components that run at arbitrary time, such as garbage collectors. On top of that, there are hardware optimizations, such as caching, that skew the picture. As the result, the best way to see what is going on is often to look at the source.

like image 27
Sergey Kalinichenko Avatar answered Sep 22 '22 22:09

Sergey Kalinichenko