Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does v8 run out of memory in this situation?

According to the node.js docs a node has a 512meg limit on 32bit version and a 1.4gig limit on the 64bit version. The limits are similar for Chrome AFAICT. (+/- 25%)

So, why does this code run out of memory when it never uses more than ~424meg of memory?

Here's the code (The code is nonsense. This question is not about what the code is doing it's about why the code is failing).

var lookup = 'superCaliFragilisticExpialidosiousThispartdoesnotrealllymattersd';
function encode (num) {
  return lookup[num];
}

function makeString(uint8) {
  var output = '';

  for (var i = 0, length = uint8.length; i < length; i += 3) {
    var temp = (uint8[i] << 16) + (uint8[i + 1] << 8) + (uint8[i + 2]);
    output += encode(temp >> 18 & 0x3F) + encode(temp >> 12 & 0x3F) + encode(temp >> 6 & 0x3F) + encode(temp & 0x3F);
  }

  return output;
}

function test() {
  var big = new Uint8Array(64 * 1024 * 1024 + 2); // multiple of 3
  var str = makeString(big);
  console.log("big:", big.length);
  console.log("str:", str.length);
}

test();

As you can see makeString builds a string by appending 4 characters at a time. In this case it's going to build a string 89478988 in length (180meg) big. Since output is being appended to, the last time characters are appended there will be 2 strings in memory. The old one with 89478984 characters and the final one with 89478988. GC should collect any other used memory.

So, 64meg (the original array) + 180meg * 2 = 424meg. Well under the v8 limits.

But, if you run the sample it will fail with out of memory

<--- Last few GCs --->

    3992 ms: Scavenge 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 0.2 / 0 ms (+ 1.5 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
    4450 ms: Mark-sweep 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 458.0 / 0 ms (+ 2.9 ms in 2 steps since start of marking, biggest step 1.5 ms) [last resort gc].
    4909 ms: Mark-sweep 1397.9 (1458.1) -> 1397.9 (1458.1) MB, 458.7 / 0 ms [last resort gc].

$ node foo.js    
<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x3a8521e3ac1 <JS Object>
    2: makeString(aka makeString) [/Users/gregg/src/foo.js:~6] [pc=0x1f83baf53a3b] (this=0x3a852104189 <undefined>,uint8=0x2ce813b51709 <an Uint8Array with map 0x32f492c0a039>)
    3: test(aka test) [/Users/gregg/src/foo.js:19] [pc=0x1f83baf4df7a] (this=0x3a852104189 <undefined>)
    4: /* anonymous */ [/Users/gregg/src/foo.js:24] [pc=0x1f83baf4d9e5] (this=0x2ce813b...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abort trap: 6

Have tried both node 4.2.4 and 5.6.0

So, the question is WHY is it running out of memory?

Some things I tried.

  1. I tried joining chunks

    Rather than append to output indefinitely I tried checking if it's greater than some size (like 8k). If so I put it in an array and reset output to the empty string.

    By doing this output is never more than 8k big. The array holds 180meg + bookkeeping. So 180meg + 8k is a lot less than 180meg + 180meg. It still runs out of memory. Now, at the end of that process I join the array at which point it will actually use more memory (180meg + 180meg + bookeeping). But, v8 crashes before it gets to that line.

  2. I tried changing encode to just

    function encode(num) {
      return 'X';
    }
    

    In this case it actually runs to completion!! So I thought, "A-ha! the issue must be something related to lookup[num] generating a new string every call? So I tried ...

  3. Changed lookup to array of strings

    var lookup = Array.prototype.map.call(
        'superCaliFragilisticExpialidosiousThispartdoesnotrealllymattersd', 
        function(c) {
          return c;
        });
    

    Still runs out of memory

This seems like a bug in v8? It's not able to GC unused strings in some strange way because of this code although #2 vs #3 is strange as they seem equivalent in terms of memory usage.

Why does v8 run out of memory in these situations? (and is there a workaround)

like image 862
gman Avatar asked Feb 12 '16 04:02

gman


People also ask

How much memory does V8 use?

V8 itself is about 3 MB (at least on ARM, YMMV). Plus, you'd need another 2 MB or so for scratch space. So that's 12 MB in total, at minimum. You'd probably want to have 20-30 MB available in total for normal operation.

What is V8 heap memory?

In V8, the garbage collector is named Orinoco. It divides the heap memory space into 2 regions: young generation and old generation. This design is based on a generational hypothesis: In most cases, young objects are much more likely to die than old objects. And the young/old generation take different strategies.


1 Answers

TL;DR: Your example is a pathological case for one of v8's internal string representations. You can fix it by indexing into output once in a while (information on why below).

First, we can use heapdump to see what the garbage collector is up to:

enter image description here

The snapshot above was taken shortly before node runs out of memory. As you can see, most things look normal: we see two strings (the very large output and the small chunk to be added), three references to the same array big (of roughly 64MB, similar to what we'd expect), and many smaller items that don't look unusual.

But, one thing stands out: output is a whopping 1.4+ GB. At the time the snapshot was taken, it was roughly 80 million characters long, so ~160 MB assuming 2 bytes per character. How is this possible?

Maybe this has to do with v8's internal string representation. Quoting mraleph:

There are two types [of v8 strings] (actually more, but for the problem at hand only these two are important):

  • flat strings are immutable arrays of characters
  • cons strings are pairs of strings, result of concatenation.

If you concat a and b you get a cons-string (a, b) that represents result of concatenation. If you later concat d to that you get another cons-string ((a, b), d).

Indexing into such a "tree-like" string is not O(1) so to make it faster V8 flattens the string when you index: copies all characters into a flat string.

So could it be that v8 is representing output as a giant tree? One way to check is to force v8 to flatten the string (as suggested by mraleph above), for example by indexing into output at regular intervals inside the for loop:

if (i % 10000000 === 0) {
  // We don't do it at each iteration since it's relatively expensive.
  output[0];
}

And indeed, the program successfully runs!

One question still remains: why did version 2 above run? It seems that in that case v8 is able to optimize away most string concatenations (all those on the right hand side, which get converted to bitwise operations on a 4-element array).

like image 87
mtth Avatar answered Oct 03 '22 20:10

mtth