Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Subsetting ffdf objects in R

Tags:

r

ff

I'm using R's ff package and I've got some ffdf objects (dimensions around 1.5M x 80) that I need to work with. I'm having some trouble getting my head around the efficient slicing/dicing operations though.

For instance I've got two integer columns named "YEAR" and "AGE", and I want to make a table of AGE when the YEAR is 2005.

One approach is this:

ffwhich <- function(x, expr) {
  b <- bit(nrow(x))
  for(i in chunk(x)) b[i] <- eval(substitute(expr), x[i,])
  b
}
bw <- ffwhich(a.fdf, YEAR==1999)
answer <- table(a.fdf[bw, "AGE"])

The table() operation is fast but building the bit vector is quite slow. Anyone have any recommendations for doing this better?

like image 946
Ken Williams Avatar asked Dec 03 '10 20:12

Ken Williams


1 Answers

The package ffbase provides many base functions for ff/ffdf objects, including subset.ff. With a bit of limited testing, it seems that subset.ff is relatively fast. Try loading ffbase and then using the simpler code you suggested from a previous comment (with(subset(a.fdf, YEAR==1999)).

like image 115
dnlbrky Avatar answered Sep 21 '22 23:09

dnlbrky