Every time I think I understand about working with vectors, what appears to be a simple problem turns my head inside out. Lot's of reading and trying different examples hasn't helped on this occasion. Please spoon feed me here...
I want to apply two custom functions to each row of a dataframe and add the results as a two new columns. Here is my sample code:
# Required packages:
library(plyr)
FindMFE <- function(x) {
MFE <- max(x, na.rm = TRUE)
MFE <- ifelse(is.infinite(MFE ) | (MFE < 0), 0, MFE)
return(MFE)
}
FindMAE <- function(x) {
MAE <- min(x, na.rm = TRUE)
MAE <- ifelse(is.infinite(MAE) | (MAE> 0), 0, MAE)
return(MAE)
}
FindMAEandMFE <- function(x){
# I know this next line is wrong...
z <- apply(x, 1, FindMFE, FindMFE)
return(z)
}
df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 = transform(df1,
FindMAEandMFE(df1)
)
#DF1 should end up with the following data...
#Bar1 Bar2 MFE MAE
#1 3 3 0
#2 1 2 0
#3 3 3 0
#-3 -2 0 -3
#-2 -3 0 -3
#-1 -1 0 -1
It would be great to get an answer using the plyr library and a more base like approach. Both will aid in my understanding. Of course, please point out where I'm going wrong if it's obvious. ;-)
Now back to the help files for me!
Edit: I would like a multivariate solution as column names may change and expand over time. It also allows re-use of the code in future.
I think you are thinking too complex here. What is wrong with two separate apply()
calls? There is however a far better way to do what you are doing here that involves no looping/apply calls. I'll deal with these separately, but the second solution is preferable as it is truly vectorised.
First two separate apply calls using all-Base R functions:
df1 <- data.frame(Bar1=c(1,2,3,-3,-2,-1),Bar2=c(3,1,3,-2,-3,-1))
df1 <- transform(df1, MFE = apply(df1, 1, FindMFE), MAE = apply(df1, 1, FindMAE))
df1
Which gives:
> df1
Bar1 Bar2 MFE MAE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
Ok, looping over the rows of df1
twice is perhaps a little inefficient, but even for big problems you've spent more time already thinking about doing this cleverly in a single pass than you will save by doing that way.
pmax()
and pmin()
So a better way of doing this is to note the pmax()
and pmin()
functions and realise that they can do what each the apply(df1, 1, FindFOO()
calls were doing. For example:
> (tmp <- with(df1, pmax(0, Bar1, Bar2, na.rm = TRUE)))
[1] 3 2 3 0 0 0
would be MFE from your Question. This is very simple to work with if you have two columns and they are Bar1
and Bar2
or the first 2 columns of df1
, always. But it is not very general; what if you have multiple columns you want to compute this over etc? pmax(df1[, 1:2], na.rm = TRUE)
won't do what we want:
> pmax(df1[, 1:2], na.rm = TRUE)
Bar1 Bar2
1 1 3
2 2 1
3 3 3
4 -3 -2
5 -2 -3
6 -1 -1
The trick to getting a general solution using pmax()
and pmin()
is to use do.call()
to arrange the calls to those two functions for us. Updating your functions to use this idea we have:
FindMFE2 <- function(x) {
MFE <- do.call(pmax, c(as.list(x), 0, na.rm = TRUE))
MFE[is.infinite(MFE)] <- 0
MFE
}
FindMAE2 <- function(x) {
MAE <- do.call(pmin, c(as.list(x), 0, na.rm = TRUE))
MAE[is.infinite(MAE)] <- 0
MAE
}
which give:
> transform(df1, MFE = FindMFE2(df1), MAE = FindMAE2(df1))
Bar1 Bar2 MFE MAE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
and not an apply()
in sight. If you want to do this in a single step, this is now much easier to wrap:
FindMAEandMFE2 <- function(x){
cbind(MFE = FindMFE2(x), MAE = FindMAE2(x))
}
which can be used as:
> cbind(df1, FindMAEandMFE2(df1))
Bar1 Bar2 MFE MAE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
I show three alternative one-liners:
each
function of plyr
plyr
each
function with base Rpmin
and pmax
functions that are vectoriseThe plyr
package defines the each
function that does what you want. From ?each
: Aggregate multiple functions into a single function. This means you can solve your problem using a one-liner:
library(plyr)
adply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))
Bar1 Bar2 MAE MFE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
You can, of course, use each
with base functions. Here is how you can use it with apply
- just note that you have to transpose the results before adding to your original data.frame.
library(plyr)
data.frame(df1,
t(apply(df1, 1, each(MAE=function(x)max(x, 0), MFE=function(x)min(x, 0)))))
Bar1 Bar2 MAE MFE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
Using vectorised functions pmin
and pmax
, you can use this one-liner:
transform(df1, MFE=pmax(0, Bar1, Bar2), MAE=pmin(0, Bar1, Bar2))
Bar1 Bar2 MFE MAE
1 1 3 3 0
2 2 1 2 0
3 3 3 3 0
4 -3 -2 0 -3
5 -2 -3 0 -3
6 -1 -1 0 -1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With