Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an "internal address" in Java?

Tags:

In the Javadoc for Object.hashCode() it states

As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java™ programming language.)

It's a common miconception this has something to do with the memory address but it doesn't as that can change without notice and the hashCode() does not and must not change for an object.

@Neet Provided a link to a good answer https://stackoverflow.com/a/565416/57695 but I am looking for more details.


Here is an example to illustrate my concern

Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null);  for (int t = 0; t < 10; t++) {     System.gc();     Object[] objects = new Object[10];     for (int i = 0; i < objects.length; i++)         objects[i] = new Object();      for (int i = 0; i < objects.length; i++) {         if (i > 0) System.out.print(", ");         int location = unsafe.getInt(objects, Unsafe.ARRAY_OBJECT_BASE_OFFSET + Unsafe.ARRAY_OBJECT_INDEX_SCALE * i);         System.out.printf("%08x: hc= %08x", location, objects[i].hashCode());     }     System.out.println(); } 

prints

eac00038: hc= 4f47e0ba, eac00048: hc= 2342d884, eac00058: hc= 7994d431, eac00068: hc= 19f71b53, eac00078: hc= 2e22f376, eac00088: hc= 789ddfa3, eac00098: hc= 44c58432, eac000a8: hc= 036a11e4, eac000b8: hc= 28bc917c, eac000c8: hc= 73f378c8 eac00038: hc= 30813486, eac00048: hc= 729f624a, eac00058: hc= 3dee2310, eac00068: hc= 5d400f33, eac00078: hc= 18a60d19, eac00088: hc= 3da5f0f3, eac00098: hc= 596e0123, eac000a8: hc= 450cceb3, eac000b8: hc= 4bd66d2f, eac000c8: hc= 6a9a4f8e eac00038: hc= 711dc088, eac00048: hc= 584b5abc, eac00058: hc= 3b3219ed, eac00068: hc= 564434f7, eac00078: hc= 17f17060, eac00088: hc= 6c08bae7, eac00098: hc= 3126cb1a, eac000a8: hc= 69e0312b, eac000b8: hc= 7dbc345a, eac000c8: hc= 4f114133 eac00038: hc= 50c8c3b8, eac00048: hc= 2ca98e77, eac00058: hc= 2fc83d89, eac00068: hc= 034005e1, eac00078: hc= 6041f871, eac00088: hc= 0b1df416, eac00098: hc= 5b83d60d, eac000a8: hc= 2c5a1e6b, eac000b8: hc= 5083198c, eac000c8: hc= 4f025f9f eac00038: hc= 00c5eb8a, eac00048: hc= 41eab16b, eac00058: hc= 1726099c, eac00068: hc= 4240eca3, eac00078: hc= 346fe350, eac00088: hc= 1db4b415, eac00098: hc= 429addef, eac000a8: hc= 45609812, eac000b8: hc= 489fe953, eac000c8: hc= 7a8f6d64 eac00038: hc= 7e628e42, eac00048: hc= 7869cfe0, eac00058: hc= 6aceb8e2, eac00068: hc= 29cc3436, eac00078: hc= 1d77daaa, eac00088: hc= 27b4de03, eac00098: hc= 535bab52, eac000a8: hc= 274cbf3f, eac000b8: hc= 1f9fd541, eac000c8: hc= 3669ae9f eac00038: hc= 772a3766, eac00048: hc= 749b46a8, eac00058: hc= 7e3bfb66, eac00068: hc= 13f62649, eac00078: hc= 054b8cdc, eac00088: hc= 230cc23b, eac00098: hc= 1aa3c177, eac000a8: hc= 74f2794a, eac000b8: hc= 5af92541, eac000c8: hc= 1afcfd10 eac00038: hc= 396e1dd8, eac00048: hc= 6c696d5c, eac00058: hc= 7d8aea9e, eac00068: hc= 2b316b76, eac00078: hc= 39862621, eac00088: hc= 16315e08, eac00098: hc= 03146a9a, eac000a8: hc= 3162a60a, eac000b8: hc= 4382f3da, eac000c8: hc= 4a578fd6 eac00038: hc= 225765b0, eac00048: hc= 17d5176d, eac00058: hc= 26f50154, eac00068: hc= 1f2a45c7, eac00078: hc= 104b1bcd, eac00088: hc= 330e3816, eac00098: hc= 6a844689, eac000a8: hc= 12330301, eac000b8: hc= 530a3ffc, eac000c8: hc= 45eee3fb eac00038: hc= 3f9432e0, eac00048: hc= 1a9830bc, eac00058: hc= 7da79447, eac00068: hc= 04f801c4, eac00078: hc= 363bed68, eac00088: hc= 185f62a9, eac00098: hc= 1e4651bf, eac000a8: hc= 1aa0e220, eac000b8: hc= 385db088, eac000c8: hc= 0ef0cda1 

As a side note; If you look at this code

if (value == 0) value = 0xBAD ; 

It appears that 0xBAD is twice as likely as normal as any hashCode as 0 is mapped to this value. If you run this long enough you see

long count = 0, countBAD = 0; while (true) {     for (int i = 0; i < 200000000; i++) {         int hc = new Object().hashCode();         if (hc == 0xBAD)             countBAD++;         count++;     }     System.out.println("0xBAD ratio is " + (double) (countBAD << 32) / count + " times expected."); } 

prints

0xBAD ratio is 2.0183116992481205 times expected. 
like image 744
Peter Lawrey Avatar asked Dec 13 '12 12:12

Peter Lawrey


People also ask

What is memory address in Java?

To find the memory address of a particular object in the JVM, we can use the addressOf() method: String answer = "42"; System.out.println("The memory address is " + VM.current().addressOf(answer)); This will print: The memory address is 31864981224. There are different compressed reference modes in the HotSpot JVM.

What is the address of an object in Java?

The hashCode() method is native because in Java it is impossible to find the address of an object, so it uses native languages like C/C++ to find the address of the object. Use of hashCode() method: It returns a hash value that is used to search objects in a collection.

Can we get memory address in Java?

Getting the memory addresses of variables is meaningless within Java, since the JVM is at liberty to implement objects and move them as it seems fit (your objects may/will move around during garbage collection etc.) Integer.


1 Answers

This is clearly implementation-specific.

Below I include the Object.hashCode() implementation used in OpenJDK 7.

The function supports six different calculation methods, only two of which take any notice of the object's address (the "address" being the C++ oop cast to intptr_t). One of the two methods uses the address as-is, whereas the other does some bit twiddling and then mashes the result with an infrequently-updated random number.

Of the remaining methods, one returns a constant (presumably for testing), one returns sequential numbers, and the rest are based on pseudo-random sequences.

It would appear that the method can be chosen at runtime, and the default seems to be method 0, which is os::random(). The latter is a linear congruential generator, with an alleged race condition thrown in. :-) The race condition is acceptable because at worst it would result in two objects sharing the same hash code; this does not break any invariants.

The computation is performed the first time a hash code is required. To maintain consistency, the result is then stored in the object's header and is returned on subsequent calls to hashCode(). The caching is done outside this function.

In summary, the notion that Object.hashCode() is based on the object's address is largely a historic artefact that has been obsoleted by the properties of modern garbage collectors.

// hotspot/src/share/vm/runtime/synchronizer.hpp  // hashCode() generation : // // Possibilities: // * MD5Digest of {obj,stwRandom} // * CRC32 of {obj,stwRandom} or any linear-feedback shift register function. // * A DES- or AES-style SBox[] mechanism // * One of the Phi-based schemes, such as: //   2654435761 = 2^32 * Phi (golden ratio) //   HashCodeValue = ((uintptr_t(obj) >> 3) * 2654435761) ^ GVars.stwRandom ; // * A variation of Marsaglia's shift-xor RNG scheme. // * (obj ^ stwRandom) is appealing, but can result //   in undesirable regularity in the hashCode values of adjacent objects //   (objects allocated back-to-back, in particular).  This could potentially //   result in hashtable collisions and reduced hashtable efficiency. //   There are simple ways to "diffuse" the middle address bits over the //   generated hashCode values: //  static inline intptr_t get_next_hash(Thread * Self, oop obj) {   intptr_t value = 0 ;   if (hashCode == 0) {      // This form uses an unguarded global Park-Miller RNG,      // so it's possible for two threads to race and generate the same RNG.      // On MP system we'll have lots of RW access to a global, so the      // mechanism induces lots of coherency traffic.      value = os::random() ;   } else   if (hashCode == 1) {      // This variation has the property of being stable (idempotent)      // between STW operations.  This can be useful in some of the 1-0      // synchronization schemes.      intptr_t addrBits = intptr_t(obj) >> 3 ;      value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;   } else   if (hashCode == 2) {      value = 1 ;            // for sensitivity testing   } else   if (hashCode == 3) {      value = ++GVars.hcSequence ;   } else   if (hashCode == 4) {      value = intptr_t(obj) ;   } else {      // Marsaglia's xor-shift scheme with thread-specific state      // This is probably the best overall implementation -- we'll      // likely make this the default in future releases.      unsigned t = Self->_hashStateX ;      t ^= (t << 11) ;      Self->_hashStateX = Self->_hashStateY ;      Self->_hashStateY = Self->_hashStateZ ;      Self->_hashStateZ = Self->_hashStateW ;      unsigned v = Self->_hashStateW ;      v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;      Self->_hashStateW = v ;      value = v ;   }    value &= markOopDesc::hash_mask;   if (value == 0) value = 0xBAD ;   assert (value != markOopDesc::no_hash, "invariant") ;   TEVENT (hashCode: GENERATE) ;   return value; } 
like image 171
NPE Avatar answered Sep 27 '22 16:09

NPE