I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of Integer
s into a HashSet
.
But what if I has "enough" memory? What would happen to Collection.size()
?
EDIT: Solved: Collection.size()
returns Integer.MAX
when the integer range is exceeded.
New question: how to determine the "real" count of elements of a collection then?
NOTE 1: Sorry, this is probably a let-me-google-it-for-you question, but I really didn't find anything ;)
NOTE 2: As far as I understand it, each integer entry of a set is:
reference + cached_hashcode + boxed_integer_object + real_int_value
, right?
NOTE 3: Funny, even with JDK7 and "compressed pointers", when the JVM uses 2GB of real memory, it shows only 1.5GB allocated memory in VisualVM
.
For those who care:
import java.util.*;
import java.lang.management.*;
public final class _BoxedValuesInSetMemoryConsumption {
private final static int MILLION = 1000 * 1000;
public static void main(String... args) {
Set<Integer> set = new HashSet<Integer>();
for (int i = 1;; ++i) {
if ((i % MILLION) == 0) {
int milsOfEntries = (i / MILLION);
long mbytes = ManagementFactory.getMemoryMXBean().
getHeapMemoryUsage().getUsed() / MILLION;
int ratio = (int) mbytes / milsOfEntries;
System.out.println(milsOfEntries + " mil, " + mbytes + " MB used, "
+ " ratio of bytes per entry: " + ratio);
}
set.add(i);
}
}
}
Tested with x64 version of JDK7 build 105 under OpenSuse 11.3 x64.
-XX:+UseCompressedOops -Xmx2048m
1 mil, 56 MB used, ratio of bytes per entry: 56
2 mil, 113 MB used, ratio of bytes per entry: 56
3 mil, 161 MB used, ratio of bytes per entry: 53
4 mil, 225 MB used, ratio of bytes per entry: 56
5 mil, 274 MB used, ratio of bytes per entry: 54
6 mil, 322 MB used, ratio of bytes per entry: 53
7 mil, 403 MB used, ratio of bytes per entry: 57
8 mil, 452 MB used, ratio of bytes per entry: 56
9 mil, 499 MB used, ratio of bytes per entry: 55
10 mil, 548 MB used, ratio of bytes per entry: 54
11 mil, 596 MB used, ratio of bytes per entry: 54
12 mil, 644 MB used, ratio of bytes per entry: 53
13 mil, 827 MB used, ratio of bytes per entry: 63
14 mil, 874 MB used, ratio of bytes per entry: 62
15 mil, 855 MB used, ratio of bytes per entry: 57
16 mil, 902 MB used, ratio of bytes per entry: 56
17 mil, 951 MB used, ratio of bytes per entry: 55
18 mil, 999 MB used, ratio of bytes per entry: 55
19 mil, 1047 MB used, ratio of bytes per entry: 55
20 mil, 1096 MB used, ratio of bytes per entry: 54
21 mil, 1143 MB used, ratio of bytes per entry: 54
22 mil, 1191 MB used, ratio of bytes per entry: 54
23 mil, 1239 MB used, ratio of bytes per entry: 53
24 mil, 1288 MB used, ratio of bytes per entry: 53
25 mil, 1337 MB used, ratio of bytes per entry: 53
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
At the end, about 2 GiB real memory were used, instead of displayed 1.3 GiB, so the consumption for each entry is even larger than 53 bytes.
I know that Java collections are very memory-hungry, and did a test myself, proving that 4GB is barely enough to store few millions of
Integers
into aHashSet
.
Java Heap != System Memory. Java's default heap size is only 128MB. Note this is also different from the memory the JVM uses.
Regarding your question: from the docs,
public int size()
Returns the number of elements in this collection. If this collection contains more than
Integer.MAX_VALUE
elements, returnsInteger.MAX_VALUE
.
Your question seems to have a quite different content than the title.
You already answered the question in the title (Integer.MAX_VALUE
is returned). And no: there's no way you can find out the "true" size with the normal APIs safe for iterating over the collection and counting (using a long
of course).
If you want to store a Set
of int
values and you know that the range and amount of values can become very big, then a BitSet
might actually be a better implementation:
import java.util.*;
import java.lang.management.*;
public final class IntegersInBitSetMemoryConsumption {
private final static int MILLION = 1000 * 1000;
public static void main(String... args) {
BitSet set = new BitSet(Integer.MAX_VALUE);
for (int i = 1;; ++i) {
if ((i % MILLION) == 0) {
int milsOfEntries = (i / MILLION);
long mbytes = ManagementFactory.getMemoryMXBean().
getHeapMemoryUsage().getUsed() / MILLION;
double ratio = mbytes / milsOfEntries;
System.out.println(milsOfEntries + " mil, " + mbytes + " MiB used, "
+ " ratio of bytes per entry: " + ratio);
}
set.set(i);
}
}
}
This will produce a constant-size data structure that can hold all values inside the range without changing size and occupying a relatively small amount of memory (1 bit per possible value plus some overhead).
This method has two drawbacks, however:
int
valuesSet
APIBoth can easily be worked around by writing a wrapper that uses two BitSet
objects (possibly lazily allocated) to hold the positive and negative value range respectively and implements adapter methods for the Set
interface.
From the source code:
/**
* Returns the number of elements in this collection. If this collection
* contains more than <tt>Integer.MAX_VALUE</tt> elements, returns
* <tt>Integer.MAX_VALUE</tt>.
*
* @return the number of elements in this collection
*/
int size();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With