Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala tuple memory overhead

What is additional memory cost of Tuple[Int, Int] ie. (1, 2) over two Ints without Tuple ?

like image 746
Lukasz Avatar asked Aug 02 '11 18:08

Lukasz


1 Answers

JVM overheads tend to be 16 to 24 bytes per object (32-bit and 64-bit respectively, though compressed pointers can make the latter smaller). Tuple2 is specialized on Int, which means it stores the values in fields, so you have 8 bytes for two ints as compared to 8+16=24 or 8+24=32 for (1,2). If you use a similar non-specialized collection (or use Tuple2 for something that it is not specialized on, like Char), then you need pointers to objects, and you may need the objects depending on whether they can be pre-allocated (arbitrary integers, no; arbitrary bytes, yes; arbitrary chars, maybe). If yes, then you just need the pointer and it's 8+16=24 or 16+24=40 bytes; if no, you need three objects, so it's 16+8+2*(16+4) = 64 and 24+16+2*(24+4) = 96 respectively.

Bottom line: objects use a lot more memory than primitive types, usually 3-4x, but sometimes over 10x. If you are short on memory, pack as much as you can into arrays. For example:

Bad for memory usage:

val a = (1 to 10000).map(x => (x,x.toString.length)).toArray

Good for memory usage:

val b = ((1 to 10000).toArray, (1 to 10000).map(_.toString.length).toArray)

If you're really tight on memory, you can then write iterators and other wrappers that let you index things as if they were an array of tuples instead of a tuple of arrays. It's a bit of a pain, but if you're really short on memory it can be worth it.

like image 70
Rex Kerr Avatar answered Oct 18 '22 19:10

Rex Kerr