Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does storing a long string cause an OOM error but a breaking it into a list of short strings does not?

I had a Java program that was using a StringBuilder to build a string from an input stream and eventually it caused an out of memory error when the string got too long. I tried breaking it up into shorter strings and storing them in an ArrayList and this avoided the OOM even though I was trying to store the same amount of data. Why is this?

My suspicion is that with one very long string, the computer has to find one contiguous place in memory for it, but with an ArrayList it could use multiple smaller places in memory. I know memory can be tricky in Java so this question may not have a straight-forward answer but hopefully someone can put me on the right track. Thanks!

like image 555
Rexana Avatar asked Jul 31 '17 00:07

Rexana


People also ask

What causes OOM error?

OutOfMemoryError exception. Usually, this error is thrown when there is insufficient space to allocate an object in the Java heap. In this case, The garbage collector cannot make space available to accommodate a new object, and the heap cannot be expanded further.

How do you handle OOM error?

OutOfMemoryError: PermGen space. As explained in the above paragraph this OutOfMemory error in java comes when the Permanent generation of heap is filled up. To fix this OutOfMemoryError in Java, you need to increase the heap size of the Perm space by using the JVM option "-XX: MaxPermSize".

Where does string save in memory?

Strings are stored on the heap area in a separate memory location known as String Constant pool.


2 Answers

Essentially, you are correct.

A StringBuilder (more precisely, AbstractStringBuilder) uses a char[] to store the string representation (though generally a String is not a char[]). While Java does not guarantee that an array is indeed stored in contiguous memory, it most probably is. Thus, whenever appending strings to the underlying array, a new array is allocated and if it is too large, an OutOfMemoryError is thrown.

Indeed, executing the code

StringBuilder b = new StringBuilder();
for (int i = 0; i < 7 * Math.pow(10, 8); i++)
    b.append("a"); // line 11

throws the exception:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:3332)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at test1.Main.main(Main.java:11)

When line 3332 char[] copy = new char[newLength]; is reached inside Arrays.copyOf, the exception is thrown because there is not enough memory for an array of size newLength.

Note also the message given with the error: "Java heap space". This means that an object (an array, in this case) could not be allocated in the Java heap. (Edit: there is another possible cause for this error, see Marco13's answer).

2.5.3. Heap

The Java Virtual Machine has a heap that is shared among all Java Virtual Machine threads. The heap is the run-time data area from which memory for all class instances and arrays is allocated.

... The memory for the heap does not need to be contiguous.

A Java Virtual Machine implementation may provide the programmer or the user control over the initial size of the heap, as well as, if the heap can be dynamically expanded or contracted, control over the maximum and minimum heap size.

The following exceptional condition is associated with the heap:

  • If a computation requires more heap than can be made available by the automatic storage management system, the Java Virtual Machine throws an OutOfMemoryError.

Breaking the array into smaller arrays of the same total size avoids the OOME because each array can be stored separately in a smaller contiguous area. Of course, you "pay" for this by having to point from each array to the next one.

Compare the above code with this one:

static StringBuilder b1 = new StringBuilder();
static StringBuilder b2 = new StringBuilder();
...
static StringBuilder b10 = new StringBuilder();

public static void main(String[] args) {
    for (int i = 0; i < Math.pow(10, 8); i++)
        b1.append("a");
    System.out.println(b1.length());
    // ...
    for (int i = 0; i < Math.pow(10, 8); i++)
        b10.append("a");
    System.out.println(b10.length());
}

The output is

100000000
100000000
100000000
100000000
100000000
100000000
100000000
100000000

and then an OOME is thrown.

While the first program could not allocate more than 7 * Math.pow(10, 8) array cells, this one sums up to at least 8 * Math.pow(10, 8).

Note that the size of the heap can be changed with VM initialization parameters, so the size which will throw the OOME is not constant between systems.

like image 83
user1803551 Avatar answered Oct 11 '22 10:10

user1803551


It could have been helpful if you had posted a stack trace, if available. But there is one very likely cause of the OutOfMemoryError that you observed.

(Although until now, this answer may only be an "educated guess". Nobody can pinpoint the reason without examining the conditions under which the error occured on your system)

When concatenating strings using a StringBuilder, then the StringBuilder will internally maintain a char[] array containing the characters of the string to be constructed.

When appending a sequence of strings, then the size of this char[] array may have to be increased after a while. This is eventually done in the AbstractStringBuilder base class:

/**
 * This method has the same contract as ensureCapacity, but is
 * never synchronized.
 */
private void ensureCapacityInternal(int minimumCapacity) {
    // overflow-conscious code
    if (minimumCapacity - value.length > 0)
        expandCapacity(minimumCapacity);
}

/**
 * This implements the expansion semantics of ensureCapacity with no
 * size check or synchronization.
 */
void expandCapacity(int minimumCapacity) {
    int newCapacity = value.length * 2 + 2;
    if (newCapacity - minimumCapacity < 0)
        newCapacity = minimumCapacity;
    if (newCapacity < 0) {
        if (minimumCapacity < 0) // overflow
            throw new OutOfMemoryError();
        newCapacity = Integer.MAX_VALUE;
    }
    value = Arrays.copyOf(value, newCapacity);
}

It is called whenever the string builder notices that the new data does not fit into the currently allocated array.

This is obviously one place where an OutOfMemoryError may be thrown. (Strictly speaking, it does not necessarily have to be really "out of memory" there. It is just checking for an overflow in view of the maximum size that an array can have...).

(Edit: Also have a look at the answer by user1803551 : This does not necessarily have to be the place where your error came from! Yours might indeed come from the Arrays class, or rather from inside the JVM)

When examining the code closely, you will notice that the size of the array is doubled each time when its capacity is expanded. This is crucial: If it would only ensure that the new data block can be appended, then appending n characters (or other strings with fixed length) to the StringBuilder would have a running time of O(n²). When the size is increased with a constant factor (here, 2), then the running time is only O(n).

However, this doubling of the size may lead to an OutOfMemoryError even though the actual size of the resulting string is still far smaller than the limit.

like image 38
Marco13 Avatar answered Oct 11 '22 11:10

Marco13