Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

doing a plyr operation on every row of a data frame in R

Tags:

r

plyr

I like the plyr syntax. Any time I have to use one of the *apply() commands I end up kicking the dog and going on a 3 day bender. So for the sake of my dog and my liver, what's concise syntax for doing a ddply operation on every row of a data frame?

Here's an example that works well for a simple case:

x <- rnorm(10) y <- rnorm(10) df <- data.frame(x,y) ddply(df,names(df) ,function(df) max(df$x,df$y)) 

that works fine and gives me what I want. But if things get more complex this causes plyr to get funky (and not like Bootsy Collins) because plyr is chewing on making "levels" out of all those floating point values

x <- rnorm(1000) y <- rnorm(1000) z <- rnorm(1000) myLetters <- sample(letters, 1000, replace=T) df <- data.frame(x,y, z, myLetters) ddply(df,names(df) ,function(df) max(df$x,df$y)) 

on my box this chews for a few minutes and then returns:

Error: memory exhausted (limit reached?) In addition: Warning messages: 1: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :   Reached total allocation of 1535Mb: see help(memory.size) 2: In paste(rep(l, each = ll), rep(lvs, length(l)), sep = sep) :   Reached total allocation of 1535Mb: see help(memory.size) 

I think I am totally abusing plyr and I am not saying this is a bug in plyr, but rather abusive behavior by me (liver and dog notwithstanding).

So in short, is there syntax shortcut for using ddply to operate on every row as a substitute for apply(X, 1, ...)?

The workaround I've been using is to create a "key" that gives a unique value for every row and then I can join back to it.

 x <- rnorm(1000)  y <- rnorm(1000)  z <- rnorm(1000)  myLetters <- sample(letters, 1000, replace=T)  df <- data.frame(x,y, z, myLetters)   #make the key  df$myKey <- 1:nrow(df)  myOut <- merge(df, ddply(df,"myKey" ,function(df) max(df$x,df$y)))   #knock out the key  myOut$myKey <- NULL 

But I keep thinking that "There Has to Be a Better Way"

Thanks!

like image 910
JD Long Avatar asked Jan 15 '10 20:01

JD Long


1 Answers

Just treat it like an array and work on each row:

adply(df, 1, transform, max = max(x, y)) 
like image 63
hadley Avatar answered Sep 28 '22 03:09

hadley