Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scala - why Double consume less memory than Floats in this case?

Here's a strange behavior I fell into and I can't find any hint on why it's like this. I use in this example the estimate method of SizeEstimator from Spark but I haven't found any glitch in their code so I wonder why - if they provide a good estimation of memory - why I have this:

val buf1 = new ArrayBuffer[(Int,Double)]
var i = 0
while (i < 3) {
   buf1 += ((i,i.toDouble))
   i += 1
}
System.out.println(s"Raw size with doubles: ${SizeEstimator.estimate(buf1)}")
val ite1 = buf1.toIterator
var size1: Long = 0l
while (ite1.hasNext) {
   val cur = ite1.next()
   size1 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with doubles: $size1")

val buf2 = new ArrayBuffer[(Int,Float)]
i = 0
while (i < 3) {
   buf2 += ((i,i.toFloat))
   i += 1
}
System.out.println(s"Raw size with floats: ${SizeEstimator.estimate(buf2)}")
val ite2 = buf2.toIterator
var size2: Long = 0l
while (ite2.hasNext) {
   val cur = ite2.next()
   size2 += SizeEstimator.estimate(cur)
 }
 System.out.println(s"Size with floats: $size2")

The console output prints:

Raw size with doubles: 200
Size with doubles: 96
Raw size with floats: 272
Size with floats: 168

So my question's quite naive: why do floats tend to take more memory than doubles in this case? And why does it get even worse when I transform it into an iterator (first case, there's a 75% ratio which becomes a 50% ratio when transforming into an iterator!).

(To have more context, I fell into this when trying to "optimize" a Spark application by changing Double to Float and found out that it actually took more memory having floats than doubles...)

P.S.: it's not due to the small size of buffers (here 3), if I put 100 instead I get:

Raw size with doubles: 3752
Size with doubles: 3200
Raw size with floats: 6152
Size with floats: 5600

and floats still consume more memory... But the ratio have stabilized, so it seems that the different ratios in transformation to iterator must be due to some overhead I guess.

EDIT: It seems that Product2 is actually only specialized on Int, Long and Double:

trait Product2[@specialized(Int, Long, Double) +T1, @specialized(Int, Long, Double) +T2] extends Any with Product

Do anyone know why Float is not taken into account? Neither Short which leads to weird behaviors...

like image 340
Vince.Bdn Avatar asked Feb 24 '16 09:02

Vince.Bdn


1 Answers

This is because Tuple2 is @specialized for Double but not specialized for Float.

That means (Int,Double) will be presented as structure with 2 fields of primitive java types int and double, while (Int,Float) will be presented as structure with int and wrapper type java.lang.Float fields

More discussion here

like image 147
Odomontois Avatar answered Nov 13 '22 20:11

Odomontois