I am trying to reduce the memory usage of a javascript web application that stores a lot of information in memory in the form of a large number of small strings. When I changed the code to use Uint8Array
instead of String
, I noticed that memory usage went up.
For example, consider the following code that creates many small strings:
// (1000000 strings) x (10 characters)
var a=[];
for (let i=0; i<1000000; i++)
a.push("a".repeat(10).toUpperCase());
If you put it in an empty page and let the memory usage settle for a few seconds, it settles at 70 MiB on Google Chrome. On the other hand, the following code:
// (1000000 arrays) x (10 bytes)
var a=[];
for (let i=0; i<1000000; i++)
a.push(new Uint8Array(10));
uses 233 MiB of memory. An empty page without any code uses about 20 MiB. On the other hand, if I create a small number of large strings/arrays, the difference becomes smaller and in the case of a single string/array with 10000000 characters/entries, the memory usage is virtually identical.
So why do typed arrays have such a large memory overhead?
A string of n characters uses 24 + n bytes: 1 byte per character plus 24 bytes object header. Similarly, a Uint8Array with n elements uses 112 + n bytes: 1 byte per element plus 112 bytes object header.
If all arrays use the same string literals, in total they take up about 200KB of RAM. If each array contains unique randomly-generated strings (of the same lengths as in the first case), they take up about 2MB of RAM.
Seems to indicate that a String is 2 bytes per character, and a boolean is 4 bytes.
Normal arrays can be as fast as typed arrays if you do a lot of sequential access. Random access outside the bounds of the array causes the array to grow. Typed arrays are fast for access, but slow to be allocated. If you create temporary arrays frequently, avoid typed arrays.
V8 developer here. Your conclusion makes sense: If you compare characters in a string to elements in a Uint8Array, the string will have less overhead. TypedArrays are great at providing fast access to typed elements; however having a large number of small TypedArrays is not memory efficient.
The difference is in the object header size for strings and typed arrays.
For a string, the object header is:
where the payload is rounded up to pointer size alignment, so 16 bytes in this case.
For a Uint8Array, you need the following:
embedder field #2
array buffer: hidden class pointer
array buffer: embedder field #2
elements object: hidden class pointer
where, again, the payload is rounded up to pointer size alignment, so consumes 16 bytes here.
In summary, each string consumes 5*8 = 40 bytes, each typed array consumes 26*8 = 208 bytes. That does seem like a lot of overhead; the reason is due to the various flexible options that TypedArrays provide (they can be overlapping views into ArrayBuffers, which can be allocated directly from JavaScript, or shared with WebGL and whatnot, etc).
(It's not about "optimizing memory allocation" nor being "better at garbage collecting strings" -- since you're holding on to all the objects, GC does not play a role.)
The typed arrays are not supposed to be used that way.
If you want high memory efficiency, use just one typed array to hold all of your integer numbers. Instead of use a huge number of arrays to hold your integer numbers due to low level reasons.
Those low level reasons are related to how much overhead is need to hold one object in memory, and that quantity depends on a few aspects like immutability and garbage collection. In this case hold one typed array has higher overhead than hold one simple string. Thats why you should pay that price one time only
You should take advantage of:
var a = []; for (let i=0; i<1000000; i++) a.push("1");
var b = new Uint8Array(10000000); for (let i=0; i<1000000; i++) a[i] = 1;
// 'b' is more memory efficient than 'a', just pay the price of Uint8Array one time
// and save the wasted memory in string allocation overhead
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With