Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply multiple functions to each row of a dataframe

Tags:

Every time I think I understand about working with vectors, what appears to be a simple problem turns my head inside out. Lot's of reading and trying different examples hasn't helped on this occasion. Please spoon feed me here...

I want to apply two custom functions to each row of a dataframe and add the results as a two new columns. Here is my sample code:

# Required packages:
library(plyr)

FindMFE <- function(x) {
    MFE <- max(x, na.rm = TRUE) 
    MFE <- ifelse(is.infinite(MFE ) | (MFE  < 0), 0, MFE)
    return(MFE)
}

FindMAE <- function(x) {
    MAE <- min(x, na.rm = TRUE) 
    MAE <- ifelse(is.infinite(MAE) | (MAE> 0), 0, MAE)
    return(MAE)
}

FindMAEandMFE <- function(x){
        # I know this next line is wrong...
    z <- apply(x, 1, FindMFE, FindMFE)
        return(z)
}

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))

df1 = transform(df1, 
    FindMAEandMFE(df1)  
)

#DF1 should end up with the following data...
#Bar1   Bar2    MFE MAE
#1      3       3   0
#2      1       2   0
#3      3       3   0
#-3     -2      0   -3
#-2     -3      0   -3
#-1     -1      0   -1

It would be great to get an answer using the plyr library and a more base like approach. Both will aid in my understanding. Of course, please point out where I'm going wrong if it's obvious. ;-)

Now back to the help files for me!

Edit: I would like a multivariate solution as column names may change and expand over time. It also allows re-use of the code in future.

like image 427
Look Left Avatar asked Aug 24 '11 10:08

Look Left


2 Answers

I think you are thinking too complex here. What is wrong with two separate apply() calls? There is however a far better way to do what you are doing here that involves no looping/apply calls. I'll deal with these separately, but the second solution is preferable as it is truly vectorised.

Two apply calls version

First two separate apply calls using all-Base R functions:

df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 <- transform(df1, MFE = apply(df1, 1, FindMFE), MAE = apply(df1, 1, FindMAE))
df1

Which gives:

> df1
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Ok, looping over the rows of df1 twice is perhaps a little inefficient, but even for big problems you've spent more time already thinking about doing this cleverly in a single pass than you will save by doing that way.

Using vectorised functions pmax() and pmin()

So a better way of doing this is to note the pmax() and pmin() functions and realise that they can do what each the apply(df1, 1, FindFOO() calls were doing. For example:

> (tmp <- with(df1, pmax(0, Bar1, Bar2, na.rm = TRUE)))
[1] 3 2 3 0 0 0

would be MFE from your Question. This is very simple to work with if you have two columns and they are Bar1 and Bar2 or the first 2 columns of df1, always. But it is not very general; what if you have multiple columns you want to compute this over etc? pmax(df1[, 1:2], na.rm = TRUE) won't do what we want:

> pmax(df1[, 1:2], na.rm = TRUE)
  Bar1 Bar2
1    1    3
2    2    1
3    3    3
4   -3   -2
5   -2   -3
6   -1   -1

The trick to getting a general solution using pmax() and pmin() is to use do.call() to arrange the calls to those two functions for us. Updating your functions to use this idea we have:

FindMFE2 <- function(x) {
   MFE <- do.call(pmax, c(as.list(x), 0, na.rm = TRUE))
   MFE[is.infinite(MFE)] <- 0
   MFE
}

FindMAE2 <- function(x) {
   MAE <- do.call(pmin, c(as.list(x), 0, na.rm = TRUE))
   MAE[is.infinite(MAE)] <- 0
   MAE
}

which give:

> transform(df1, MFE = FindMFE2(df1), MAE = FindMAE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

and not an apply() in sight. If you want to do this in a single step, this is now much easier to wrap:

FindMAEandMFE2 <- function(x){
    cbind(MFE = FindMFE2(x), MAE = FindMAE2(x))
}

which can be used as:

> cbind(df1, FindMAEandMFE2(df1))
  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1
like image 118
Gavin Simpson Avatar answered Oct 02 '22 09:10

Gavin Simpson


I show three alternative one-liners:

  • Using the each function of plyr
  • Using the plyr each function with base R
  • Using the pmin and pmax functions that are vectorise

Solution 1: plyr and each

The plyr package defines the each function that does what you want. From ?each: Aggregate multiple functions into a single function. This means you can solve your problem using a one-liner:

library(plyr)
adply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Solution 2: each and base R

You can, of course, use each with base functions. Here is how you can use it with apply - just note that you have to transpose the results before adding to your original data.frame.

library(plyr)
data.frame(df1, 
  t(apply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))))

  Bar1 Bar2 MAE MFE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1

Solution 3: using vectorised functions

Using vectorised functions pmin and pmax, you can use this one-liner:

transform(df1, MFE=pmax(0, Bar1, Bar2), MAE=pmin(0, Bar1, Bar2))

  Bar1 Bar2 MFE MAE
1    1    3   3   0
2    2    1   2   0
3    3    3   3   0
4   -3   -2   0  -3
5   -2   -3   0  -3
6   -1   -1   0  -1
like image 28
Andrie Avatar answered Oct 02 '22 09:10

Andrie