If IIUC each fork creates a separate virtual machine for the reason that each virtual machine instance might run with slight differences in JIT instructions?
I'm also curious about what the time attribute does in the below annotations:
@Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) @Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)
TIA, Ole
Java Microbenchmark Harness or JMH is a tool for creating Java microbenchmarks.
JMH is short for Java Microbenchmark Harness. JMH is a toolkit that helps you implement Java microbenchmarks correctly. JMH is developed by the same people who implement the Java virtual machine, so these guys know what they are doing.
In JMH, "operation" is an abstract unit of work. See e.g. the sample result: Benchmark Mode Cnt Score Error Units MyBenchmark.testMethod avgt 5 5.068 ± 0.586 ns/op. Here, the performance is 5.068 nanoseconds per operation. Nominally, one operation is one @Benchmark invocation.
With a JMH benchmark you run one or more forks sequentially, and one or more iterations of your benchmark code within each fork. There are two forms of warmup associated with this: At the fork level the warmups parameter to @Fork specifies how many warmup forks to run before running the benchmarked forks.
JMH offers the fork functionality for a few reasons. One is compilation profile separation as discussed by Rafael above. But this behaviour is not controlled by the @Forks annotation (unless you choose 0 forks, which means no subprocesses are forked to run benchmarks at all). You can choose to run all the benchmarks as part of your benchmark warmup (thus creating a mixed profile for the JIT to work with) by using the warmup mode control(-wm).
The reality is that many things can conspire to tilt your results one way or another and running any benchmark multiple times to establish run-to-run variance is an important practice which JMH supports (and most hand-rolled framework don't help with). Reasons for run to run variance might include (but I'm sure there's more):
CPU start at a certain C-state and scale up the frequency at the face of load, then overheat and scale it down. You can control this issue on certain OSs.
Memory alignment of your process can lead to paging behaviour differences.
Running your benchmark with at least a few forks will help shake out these differences and give you an idea of the run to run variance you see in your benchmark. I'd recommend you start with the default of 10 and cut it back (or increase it) experimentally depending on your benchmark.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With