Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R ddply, applying if and ifelse functions

Tags:

r

plyr

I'm trying to apply a function to a dataframe using ddply from the plyr package, but I'm getting some results that I don't understand. I have 3 questions about the results

Given:

mydf<- data.frame(c(12,34,9,3,22,55),c(1,2,1,1,2,2)
                  , c(0,1,2,1,1,2))
colnames(mydf)[1] <- 'n'
colnames(mydf)[2] <- 'x'
colnames(mydf)[3] <- 'x1'

mydf looks like this:

   n x x1
1 12 1  0
2 34 2  1
3  9 1  2
4  3 1  1
5 22 2  1
6 55 2  2

Question #1

If I do:

k <- function(x) {
  mydf$z <- ifelse(x == 1, 0, mydf$n)
  return (mydf)
}
mydf <- ddply(mydf, c("x") , .fun = k, .inform = TRUE)

I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "z", value = structure(c(12, 34, 9,  : 
  replacement has 3 rows, data has 6
Error: with piece 1: 
   n x x1
1 12 1  0
2  9 1  2
3  3 1  1

I get this error regardless of whether I specify the variable to split by as c("x"), "x", or .(x). I don't understand why I'm getting this error message.

Question #2

But, what I really want to do is set up an if/else function because my dataset has variables x1, x2, x3, and x4 and I want to take those variables into account as well. But when I try something simple such as:

j <- function(x) {
  if(x == 1){
    mydf$z <- 0
  } else {
    mydf$z <- mydf$n
  }
  return(mydf)
  }

mydf <- ddply(mydf, x, .fun = j, .inform = TRUE)

I get:

Warning messages:
1: In if (x == 1) { :
  the condition has length > 1 and only the first element will be used
2: In if (x == 1) { :
  the condition has length > 1 and only the first element will be used

Question #3

I'm confused about to use function() and when to use function(x). Using function() for either j() or k() gives me a different error:

Error in .fun(piece, ...) : unused argument (piece)
Error: with piece 1: 
    n x x1  z
1  12 1  0 12
2   9 1  2  9
3   3 1  1  3
4  12 1  0 12
5   9 1  2  9
6   3 1  1  3
7  12 1  0 12
8   9 1  2  9
9   3 1  1  3
10 12 1  0 12
11  9 1  2  9
12  3 1  1  3

where column z is not correct. Yet I see a lot of functions written as function().

I sincerely appreciate any comments that can help me out with this

like image 278
SCallan Avatar asked Aug 29 '13 20:08

SCallan


1 Answers

There's a lot that needs explaining here. Let's start with the simplest case. In your first example, all you need is:

mydf$z <- with(mydf,ifelse(x == 1,0,n))

An equivalent ddply solution might look like this:

ddply(mydf,.(x),transform,z = ifelse(x == 1,0,n))

Probably your biggest source of confusion is that you seem to not understand what is being passed as arguments to functions within ddply.

Consider your first attempt:

k <- function(x) {
  mydf$z <- ifelse(x == 1, 0, mydf$n)
  return (mydf)
}

The way ddply works is that it splits mydf up into several, smaller data frame, based on the values in the column x. That means that each time ddply calls k, the argument passed to k is a data frame. Specifically, a subset of you primary data frame.

So within k, x is a subset of mydf, with all the columns. You should not be trying to modify mydf from within k. Modify x, and then return the modified version. (If you must, but the options I displayed above are better.) So we might re-write your k like this:

k <- function(x) {
  x$z <- ifelse(x$x == 1, 0, x$n)
  return (x)
}

Note that you've created some confusing stuff by using x as both an argument to k and as the name of one of our columns.

like image 62
joran Avatar answered Oct 19 '22 19:10

joran