I'm using R's ff package and I've got some ffdf objects (dimensions around 1.5M x 80) that I need to work with.  I'm having some trouble getting my head around the efficient slicing/dicing operations though.
For instance I've got two integer columns named "YEAR" and "AGE", and I want to make a table of AGE when the YEAR is 2005.
One approach is this:
ffwhich <- function(x, expr) {
  b <- bit(nrow(x))
  for(i in chunk(x)) b[i] <- eval(substitute(expr), x[i,])
  b
}
bw <- ffwhich(a.fdf, YEAR==1999)
answer <- table(a.fdf[bw, "AGE"])
The table() operation is fast but building the bit vector is quite slow.  Anyone have any recommendations for doing this better?
The package ffbase provides many base functions for ff/ffdf objects, including subset.ff.  With a bit of limited testing, it seems that subset.ff is relatively fast.  Try loading ffbase and then using the simpler code you suggested from a previous comment (with(subset(a.fdf, YEAR==1999)).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With