I am trying to reshape a data frame in R and it seems to have problems using the recommended ways of doing so. The data frame has the following structure:
ID DATE1 DATE2 VALTYPE VALUE
'abcd1233' 2009-11-12 2009-12-23 'TYPE1' 123.45
...
VALTYPE
is a string and is a factor with only 2 values (say TYPE1
and TYPE2
). I need to transform it into the following data frame ("wide" transpose) based on common ID and DATEs:
ID DATE1 DATE2 VALUE.TYPE1 VALUE.TYPE2
'abcd1233' 2009-11-12 2009-12-23 123.45 NA
...
The data frame has more than 4,500,000 observations (although about 70% of VALUE
s are NA
). The machine is an Intel-based Linux workstation with 4Gb of RAM. Loading the data (from a compressed Rdata file) into a fresh R process makes it grow to about 250Mb which clearly leaves a lot of space for reshaping.
These are my experiences so far:
Using vanilla reshape()
method:
tbl2 <- reshape(tbl, direction = "wide", idvar = c("ID", "DATE1", "DATE2"), timevar = "VALTYPE");
RESULT: Error: cannot allocate vector of size 4.8 Gb
Using cast()
method of reshape
package:
tbl2 <- cast(tbl, ID + DATE1 + DATE2 ~ VALTYPE);
RESULT: R process consumes all RAM with no end in sight. Had to kill the process eventually.
Using by()
and merge()
:
sp <- by(tbl[c(1,2,3,5)], tbl$VALTYPE, function(x) x); tbl <- merge(sp[["TYPE1"]], sp[["TYPE2"]], by = c("ID", "DATE1", "DATE2"), all = TRUE, sort = TRUE);
RESULT: works fine, although this is not very elegant and foolproof (i.e. it will break if more types are added).
To add insult to injury, the operation in question can be trivially achieved in about 3 lines of AWK or Perl (and with hardly any RAM used). So the question is: what is a better way to do this operation in R using recommended methods without consuming all available RAM?
A useful trick is to combine the id variables into a character vector and then do the reshape.
tbl$NEWID <- with(tbl, paste(ID, DATE1, DATE2, sep=";"))
tbl2 <- recast(tbl2, NEWID ~ VALTYPE, measure.var="VALUE")
It's about 40% faster in a problem of similar size in my intel core2 duo 2.2ghz macbook.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With