Here's a strange behavior I fell into and I can't find any hint on why it's like this. I use in this example the estimate method of SizeEstimator from Spark but I haven't found any glitch in their code so I wonder why - if they provide a good estimation of memory - why I have this:
val buf1 = new ArrayBuffer[(Int,Double)]
var i = 0
while (i < 3) {
buf1 += ((i,i.toDouble))
i += 1
}
System.out.println(s"Raw size with doubles: ${SizeEstimator.estimate(buf1)}")
val ite1 = buf1.toIterator
var size1: Long = 0l
while (ite1.hasNext) {
val cur = ite1.next()
size1 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with doubles: $size1")
val buf2 = new ArrayBuffer[(Int,Float)]
i = 0
while (i < 3) {
buf2 += ((i,i.toFloat))
i += 1
}
System.out.println(s"Raw size with floats: ${SizeEstimator.estimate(buf2)}")
val ite2 = buf2.toIterator
var size2: Long = 0l
while (ite2.hasNext) {
val cur = ite2.next()
size2 += SizeEstimator.estimate(cur)
}
System.out.println(s"Size with floats: $size2")
The console output prints:
Raw size with doubles: 200
Size with doubles: 96
Raw size with floats: 272
Size with floats: 168
So my question's quite naive: why do floats tend to take more memory than doubles in this case? And why does it get even worse when I transform it into an iterator (first case, there's a 75% ratio which becomes a 50% ratio when transforming into an iterator!).
(To have more context, I fell into this when trying to "optimize" a Spark application by changing Double
to Float
and found out that it actually took more memory having floats than doubles...)
P.S.: it's not due to the small size of buffers (here 3), if I put 100 instead I get:
Raw size with doubles: 3752
Size with doubles: 3200
Raw size with floats: 6152
Size with floats: 5600
and floats still consume more memory... But the ratio have stabilized, so it seems that the different ratios in transformation to iterator must be due to some overhead I guess.
EDIT: It seems that Product2
is actually only specialized on Int
, Long
and Double
:
trait Product2[@specialized(Int, Long, Double) +T1, @specialized(Int, Long, Double) +T2] extends Any with Product
Do anyone know why Float
is not taken into account? Neither Short
which leads to weird behaviors...
This is because Tuple2
is @specialized
for Double
but not specialized for Float
.
That means (Int,Double)
will be presented as structure with 2 fields of primitive java types int
and double
, while (Int,Float)
will be presented as structure with int
and wrapper type java.lang.Float
fields
More discussion here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With