I am bit surprised that the default (native) implementation of the hashCode()
method appears ~50x slower than a simple override of the method for the following benchmark.
Consider a basic Book
class that does not override hashCode()
:
public class Book {
private int id;
private String title;
private String author;
private Double price;
public Book(int id, String title, String author, Double price) {
this.id = id;
this.title = title;
this.author = author;
this.price = price;
}
}
Consider, alternatively, an otherwise identical Book
class, BookWithHash
, that overrides the hashCode()
method using the default implementation from Intellij:
public class BookWithHash {
private int id;
private String title;
private String author;
private Double price;
public BookWithHash(int id, String title, String author, Double price) {
this.id = id;
this.title = title;
this.author = author;
this.price = price;
}
@Override
public boolean equals(final Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
final BookWithHash that = (BookWithHash) o;
if (id != that.id) return false;
if (title != null ? !title.equals(that.title) : that.title != null) return false;
if (author != null ? !author.equals(that.author) : that.author != null) return false;
return price != null ? price.equals(that.price) : that.price == null;
}
@Override
public int hashCode() {
int result = id;
result = 31 * result + (title != null ? title.hashCode() : 0);
result = 31 * result + (author != null ? author.hashCode() : 0);
result = 31 * result + (price != null ? price.hashCode() : 0);
return result;
}
}
Then, the results of the following JMH benchmark suggests to me that the default hashCode() method from the Object
class is almost 50x slower than the (seemingly more complex) implementation of hashCode()
in the BookWithHash
class:
public class Main {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(Main.class.getSimpleName()).forks(1).build();
new Runner(opt).run();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public long bookWithHashKey() {
long sum = 0L;
for (int i = 0; i < 10_000; i++) {
sum += (new BookWithHash(i, "Jane Eyre", "Charlotte Bronte", 14.99)).hashCode();
}
return sum;
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public long bookKey() {
long sum = 0L;
for (int i = 0; i < 10_000; i++) {
sum += (new Book(i, "Jane Eyre", "Charlotte Bronte", 14.99)).hashCode();
}
return sum;
}
}
Indeed, the summarized results suggest that calling hashCode()
on the BookWithHash
class is an order of magnitude faster than calling hashCode()
on the Book
class (see below for full JMH output):
The reason I am surprised by this is that I understand the default Object.hashCode()
implementation to (usually) be a hash of the initial memory address for the object, which (for the memory lookup at least) I would expect to be exceedingly fast at the microarchitecure level. These results seem to suggest to me that the hashing of the memory location is the bottleneck in Object.hashCode()
, when compared to the simple override as given above. I would appreciate others' insights into my understanding and what could be causing this surprising behavior.
Full JMH output:
You have misused JMH, so the benchmark scores do not have much sense.
Blackhole.consume
or by returning the result from a method.@State
variables in order to avoid constant folding and constant propagation.In your case, BookWithHash
objects are transient: JIT realizes the objects do not escape, and eliminates allocation altogether. Furthermore, since some of the object fields are constant, JIT can simplify hashCode
computation by using constants instead of reading the object fields.
On the contrary, the default hashCode
relies on the object identity. That's why the allocation of Book
cannot be eliminated. So, your benchmark is actually comparing the allocation of 20000 objects (mind the Double
object) with some arithmetic operations on the local variables and constants. No surprise, the latter is much faster.
Another thing to take into account is that the first call of identity hashCode
is much slower than the subsequent calls, because the hashCode needs to be first generated and put into the object header. This in turn requires a call to VM runtime.
The second and the subsequent calls of hashCode
will just get the cached value from the object header, and this indeed will be much faster.
Here is a corrected benchmark that compares 4 cases:
@State(Scope.Benchmark)
public class HashCode {
int id = 123;
String title = "Jane Eyre";
String author = "Charlotte Bronte";
Double price = 14.99;
Book book = new Book(id, title, author, price);
BookWithHash bookWithHash = new BookWithHash(id, title, author, price);
@Benchmark
public int book() {
return book.hashCode();
}
@Benchmark
public int bookWithHash() {
return bookWithHash.hashCode();
}
@Benchmark
public int newBook() {
return (book = new Book(id, title, author, price)).hashCode();
}
@Benchmark
public int newBookWithHash() {
return (bookWithHash = new BookWithHash(id, title, author, price)).hashCode();
}
}
Benchmark Mode Cnt Score Error Units
HashCode.book avgt 5 2,907 ± 0,032 ns/op
HashCode.bookWithHash avgt 5 5,052 ± 0,119 ns/op
HashCode.newBook avgt 5 74,280 ± 5,384 ns/op
HashCode.newBookWithHash avgt 5 14,401 ± 0,041 ns/op
The results show that getting an identity hashCode of an existing object is notably faster than computing hashCode over the object fields (2.9 vs. 5 ns). However, generating a new identity hashCode is a really slow operation, even comparing to an object allocation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With