Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do HashSets not have a stable serialization?

Tags:

java

Take a HashSet in Java. Put a string in it. Serialize it. You end up with some bytes - bytesA.

Take bytesA, deserialize it back as an Object - fromBytes.

Now reserialize fromBytes and you've got yourself another array of bytes - bytesB.

Strangely enough, these two byte arrays are not equal. One byte is different! Why? Interestingly, this does not affect TreeSet or HashMap. It does however affect LinkedHashSet.

Set<String> stringSet = new HashSet<>();
stringSet.add("aaaaaaaaaa");

//Serialize it
byte[] bytesA;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
  ObjectOutputStream out = new ObjectOutputStream(bos);
  out.writeObject(stringSet);
  out.flush();
  bytesA = bos.toByteArray();
}

// Deserialize it
Object fromBytes;
try (ByteArrayInputStream is = new ByteArrayInputStream(bytesA)) {
  try(ObjectInputStream ois = new ObjectInputStream(is)) {
    fromBytes = ois.readObject();
  }
}

//Serialize it.
byte[] bytesB;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream()) {
  ObjectOutputStream out = new ObjectOutputStream(bos);
  out.writeObject(fromBytes);
  out.flush();
  bytesB = bos.toByteArray();
}

assert Arrays.equals(bytesA, bytesB); 

//array contents differ at index [43], expected: <16> but was: <2>

In case these help: xxd hex dump of bytesA

00000000: aced 0005 7372 0011 6a61 7661 2e75 7469  ....sr..java.uti
00000010: 6c2e 4861 7368 5365 74ba 4485 9596 b8b7  l.HashSet.D.....
00000020: 3403 0000 7870 770c 0000 0010 3f40 0000  4...xpw.....?@..
00000030: 0000 0001 7400 0a61 6161 6161 6161 6161  ....t..aaaaaaaaa
00000040: 6178                                     ax

xxd hex dump of bytesB

00000000: aced 0005 7372 0011 6a61 7661 2e75 7469  ....sr..java.uti
00000010: 6c2e 4861 7368 5365 74ba 4485 9596 b8b7  l.HashSet.D.....
00000020: 3403 0000 7870 770c 0000 0002 3f40 0000  4...xpw.....?@..
00000030: 0000 0001 7400 0a61 6161 6161 6161 6161  ....t..aaaaaaaaa
00000040: 6178                                     ax

3rd line 6th column is the difference.

I'm on Java 11.0.3.


(RESOLVED)

As per Alex R's response - what happens is that HashSet's writeObject stores the capacity, loadFactor, and size of the backing HashMap, but its readObject recalculates the capacity as:

capacity = (int)Math.min((float)size * Math.min(1.0F / loadFactor, 4.0F), 1.07374182E9F);

Other than a sanity check, it actually ignores the capacity value that was originally stored!

like image 846
tamperingbeluga Avatar asked Oct 24 '19 23:10

tamperingbeluga


People also ask

Why would you use a HashSet instead of a HashMap?

HashSet allows us to store objects in the set where as HashMap allows us to store objects on the basis of key and value. Every object or stored object will be having key.

Is HashSet serializable Java?

But implementation of set is HashSet. This is serializable. You would not expect to serialize a Set because a Set can not be instantiated.

Is collection serializable in Java?

In Java, the Collection-interfaces do not extend Serializable for several good reasons. In addition, most common implementations of these interfaces implement Serializable.


1 Answers

If you create a HashSet using the constructor it creates a HashMap with a default size of 16.

If you deserialize it, the size might be initialized to be less than 16 if your set contains less entries. This is what happens in this case.

Take a look at the readObject implementation of HashSet to see how the size is calculated.

Printing the two byte arrays gives you a hint that this has happened indeed:

[..., 16, ...]
[..., 2,...]
like image 68
Alex R Avatar answered Sep 18 '22 23:09

Alex R