Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cast function is extremely memory consuming, how to handle it?

I have a table looks like:

date    item_id    store_id   sale_num
1/1/15    33         1          10
1/1/15    33         2          12
1/1/15    33         3          15
1/1/15    44         1          54
1/1/15    44         3          66 
1/2/15    33         1          14
....  

I want to cast the table, in order to put store_id into multiple columns, and value is the sale_num. The table should be like:

date    item_id   store1   store2   store3  
1/1/15   33         10       12       15
1/1/15   44         54       NA       66
1/2/15   33         14       NA       NA
......

When I do this using cast function in a small scale, 1000 rows in original table, there is no problem.

However, the original table has 38,000,000 rows and comsumes 1.5 GB memory in R. When I use cast function, the function cost around 34 GB memory, and it runs endlessly.

What is the problem of it? Is there any alternative way?

like image 275
lserlohn Avatar asked Feb 28 '26 14:02

lserlohn


1 Answers

We can use the dcast from data.table. It should be more efficient than the cast from reshape. We convert the 'data.frame' to 'data.table' (setDT(df1)) and then use dcast.

library(data.table)
dcast(setDT(df1), date+item_id~ paste0("store", 
              store_id), value.var="sale_num")
#      date item_id store1 store2 store3
#1: 1/1/15      33     10     12     15
#2: 1/1/15      44     54     NA     66
#3: 1/2/15      33     14     NA     NA
like image 102
akrun Avatar answered Mar 03 '26 06:03

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!