Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java collections: What happens when "size" exceeds "int"?

I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integers into a HashSet.

But what if I has "enough" memory? What would happen to Collection.size()?

EDIT: Solved: Collection.size() returns Integer.MAX when the integer range is exceeded.
New question: how to determine the "real" count of elements of a collection then?

NOTE 1: Sorry, this is probably a let-me-google-it-for-you question, but I really didn't find anything ;)

NOTE 2: As far as I understand it, each integer entry of a set is: reference + cached_hashcode + boxed_integer_object + real_int_value, right?

NOTE 3: Funny, even with JDK7 and "compressed pointers", when the JVM uses 2GB of real memory, it shows only 1.5GB allocated memory in VisualVM.

For those who care:

Test sources:

import java.util.*;
import java.lang.management.*;

public final class _BoxedValuesInSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    Set<Integer> set = new HashSet<Integer>();

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        int ratio = (int) mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.add(i);
    }
  }
}

Execution parameters:

Tested with x64 version of JDK7 build 105 under OpenSuse 11.3 x64.

-XX:+UseCompressedOops -Xmx2048m

Output result:

1 mil, 56 MB used,  ratio of bytes per entry: 56
2 mil, 113 MB used,  ratio of bytes per entry: 56
3 mil, 161 MB used,  ratio of bytes per entry: 53
4 mil, 225 MB used,  ratio of bytes per entry: 56
5 mil, 274 MB used,  ratio of bytes per entry: 54
6 mil, 322 MB used,  ratio of bytes per entry: 53
7 mil, 403 MB used,  ratio of bytes per entry: 57
8 mil, 452 MB used,  ratio of bytes per entry: 56
9 mil, 499 MB used,  ratio of bytes per entry: 55
10 mil, 548 MB used,  ratio of bytes per entry: 54
11 mil, 596 MB used,  ratio of bytes per entry: 54
12 mil, 644 MB used,  ratio of bytes per entry: 53
13 mil, 827 MB used,  ratio of bytes per entry: 63
14 mil, 874 MB used,  ratio of bytes per entry: 62
15 mil, 855 MB used,  ratio of bytes per entry: 57
16 mil, 902 MB used,  ratio of bytes per entry: 56
17 mil, 951 MB used,  ratio of bytes per entry: 55
18 mil, 999 MB used,  ratio of bytes per entry: 55
19 mil, 1047 MB used,  ratio of bytes per entry: 55
20 mil, 1096 MB used,  ratio of bytes per entry: 54
21 mil, 1143 MB used,  ratio of bytes per entry: 54
22 mil, 1191 MB used,  ratio of bytes per entry: 54
23 mil, 1239 MB used,  ratio of bytes per entry: 53
24 mil, 1288 MB used,  ratio of bytes per entry: 53
25 mil, 1337 MB used,  ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

At the end, about 2 GiB real memory were used, instead of displayed 1.3 GiB, so the consumption for each entry is even larger than 53 bytes.

like image 820
java.is.for.desktop Avatar asked Aug 23 '10 13:08

java.is.for.desktop


3 Answers

I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integers into a HashSet.

Java Heap != System Memory. Java's default heap size is only 128MB. Note this is also different from the memory the JVM uses.

Regarding your question: from the docs,

public int size()

Returns the number of elements in this collection. If this collection contains more than Integer.MAX_VALUE elements, returns Integer.MAX_VALUE.

like image 122
quantumSoup Avatar answered Nov 07 '22 16:11

quantumSoup


Your question seems to have a quite different content than the title.

You already answered the question in the title (Integer.MAX_VALUE is returned). And no: there's no way you can find out the "true" size with the normal APIs safe for iterating over the collection and counting (using a long of course).

If you want to store a Set of int values and you know that the range and amount of values can become very big, then a BitSet might actually be a better implementation:

import java.util.*;
import java.lang.management.*;

public final class IntegersInBitSetMemoryConsumption {
  private final static int MILLION = 1000 * 1000;

  public static void main(String... args) {
    BitSet set = new BitSet(Integer.MAX_VALUE);

    for (int i = 1;; ++i) {
      if ((i % MILLION) == 0) {
        int milsOfEntries = (i / MILLION);
        long mbytes = ManagementFactory.getMemoryMXBean().
            getHeapMemoryUsage().getUsed() / MILLION;
        double ratio = mbytes / milsOfEntries;
        System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
            + " ratio of bytes per entry: " + ratio);
      }

      set.set(i);
    }
  }
}

This will produce a constant-size data structure that can hold all values inside the range without changing size and occupying a relatively small amount of memory (1 bit per possible value plus some overhead).

This method has two drawbacks, however:

  • it doesn't support negative int values
  • it doesn't provide the Set API

Both can easily be worked around by writing a wrapper that uses two BitSet objects (possibly lazily allocated) to hold the positive and negative value range respectively and implements adapter methods for the Set interface.

like image 21
Joachim Sauer Avatar answered Nov 07 '22 15:11

Joachim Sauer


From the source code:

 /**
 * Returns the number of elements in this collection.  If this collection
 * contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
 * <tt>Integer.MAX_VALUE</tt>.
 * 
 * @return the number of elements in this collection
 */
int size();
like image 35
Ido Weinstein Avatar answered Nov 07 '22 15:11

Ido Weinstein