Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Memory overhead of typed arrays vs strings

Tags:

I am trying to reduce the memory usage of a javascript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array instead of String, I noticed that memory usage went up.

For example, consider the following code that creates many small strings:

// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
    a.push("a".repeat(10).toUpperCase());

If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:

// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
    a.push(new Uint8Array(10));

uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.

So why do typed arrays have such a large memory overhead?

like image 333
safsaf32 Avatar asked Aug 21 '17 18:08

safsaf32


People also ask

Which takes more space array or string?

A string of n characters uses 24 + n bytes: 1 byte per character plus 24 bytes object header. Similarly, a Uint8Array with n elements uses 112 + n bytes: 1 byte per element plus 112 bytes object header.

How much memory does a string array use?

If all arrays use the same string literals, in total they take up about 200KB of RAM. If each array contains unique randomly-generated strings (of the same lengths as in the first case), they take up about 2MB of RAM.

How much memory does string take in JavaScript?

Seems to indicate that a String is 2 bytes per character, and a boolean is 4 bytes.

Why might you use typed arrays instead of standard arrays?

Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow. Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays.


2 Answers

V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.

The difference is in the object header size for strings and typed arrays.

For a string, the object header is:

  1. hidden class pointer
  2. hash
  3. length
  4. payload

where the payload is rounded up to pointer size alignment, so 16 bytes in this case.

For a Uint8Array, you need the following:

  1. hidden class pointer
  2. properties pointer (unused)
  3. elements pointer (see below)
  4. array buffer pointer (see below)
  5. offset into array buffer
  6. byte length
  7. length of view into array buffer
  8. length (user-visible)
  9. embedder field #1
  10. embedder field #2

  11. array buffer: hidden class pointer

  12. array buffer: properties pointer (unused)
  13. array buffer: elements pointer (see below)
  14. array buffer: byte length
  15. array buffer: backing store
  16. array buffer: allocation base
  17. array buffer: allocation length
  18. array buffer: bit field (internal flags)
  19. array buffer: embedder field #1
  20. array buffer: embedder field #2

  21. elements object: hidden class pointer

  22. elements object: length (of the backing store)
  23. elements object: base pointer (of the backing store)
  24. elements object: offset to data start
  25. elements object: payload

where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.

In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).

(It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)

like image 80
jmrk Avatar answered Sep 26 '22 15:09

jmrk


The typed arrays are not supposed to be used that way.

If you want high memory efficiency, use just one typed array to hold all of your integer numbers. Instead of use a huge number of arrays to hold your integer numbers due to low level reasons.

Those low level reasons are related to how much overhead is need to hold one object in memory, and that quantity depends on a few aspects like immutability and garbage collection. In this case hold one typed array has higher overhead than hold one simple string. Thats why you should pay that price one time only

You should take advantage of:

var a = [];                       for (let i=0; i<1000000; i++) a.push("1");
var b = new Uint8Array(10000000); for (let i=0; i<1000000; i++) a[i] = 1;
// 'b' is more memory efficient than 'a', just pay the price of Uint8Array one time
// and save the wasted memory in string allocation overhead 
like image 32
Jairo Andres Velasco Romero Avatar answered Sep 22 '22 15:09

Jairo Andres Velasco Romero