I am running out of RAM in R with a data.table that contains ~100M rows and 40 columns full of doubles. My naive thought was that I could reduce the object size of the data table by reducing the precision. There is no need for 15 digits after the comma. I played around by rounding, but as we know
round(1.68789451154844878,3)
gives
1.6879999999999999
and does not help. Therefore, I transformed the values to integers. However, as the small examples below show for a numeric vector, there is only a 50% reduction from 8000040 bytes to 4000040 bytes and this reduction does not increase any more when reducing the precision further.
Is there a better way to do that?
set.seed(1)
options(digits=22)
a1 = rnorm(10^6)
a2 = as.integer(1000000*(a1))
a3 = as.integer(100000*(a1))
a4 = as.integer(10000*(a1))
a5 = as.integer(1000*(a1))
head(a1)
head(a2)
head(a3)
head(a4)
head(a5)
give
[1] -0.62645381074233242 0.18364332422208224 -0.83562861241004716 1.59528080213779155 0.32950777181536051 -0.82046838411801526
[1] -626453 183643 -835628 1595280 329507 -820468
[1] -62645 18364 -83562 159528 32950 -82046
[1] -6264 1836 -8356 15952 3295 -8204
[1] -626 183 -835 1595 329 -820
and
object.size(a1)
object.size(a2)
object.size(a3)
object.size(a4)
object.size(a5)
give
8000040 bytes
4000040 bytes
4000040 bytes
4000040 bytes
4000040 bytes
Not as such, no. In R, an integer takes 4 bytes and a double takes 8. If you are allocating space for 1M integers you perforce are going to need 4M bytes of RAM for the vector of results.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With