Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

conditional calculations in data frame

Tags:

r

dplyr

I frequently have to calculate new variables from existing ones in a data frame based on a condition of a factor variable.

Edit Getting 4 answers in like 2 mins, I realised I have oversimplified my example. Please see below.

Simple example:

df <- data.frame(value=c(1:5),class=letters[1:5])
df
value class
1     a
2     b
3     c
4     d
5     e

I can use such code

df %>% 
    mutate(result=NA) %>%
    mutate(result=ifelse(class=="a",value*1,result)) %>%
    mutate(result=ifelse(class=="b",value*2,result)) %>%
    mutate(result=ifelse(class=="c",value*3,result)) %>%
    mutate(result=ifelse(class=="d",value*4,result)) %>%
    mutate(result=ifelse(class=="e",value*5,result))

to perform conditional calculations on my variables, resulting in

value class result
 1     a      1
 2     b      4
 3     c      9
 4     d     16
 5     e     25

As in reality the number of classes is larger and the calculations are more complex, however, I would prefer something cleaner, like this

df %>%
mutate(results=switch(levels(class),
                    "a"=value*1,
                    "b"=value*2,
                    "c"=value*3,
                    "d"=value*4,
                    "e"=value*5))

which obviously doesn't work

Error in switch(levels(1:5), a = 1:5 * 1, b = 1:5 * 2, c = 1:5 * 3, d =  1:5 *  : 
  EXPR must be a length 1 vector

Is there a way I can do this more nicely with dplyr piping (or else)?

Edit In reality, I have more value variables to include in my calculations and they are not simple consecutive vectors, they are thousands of rows of measured data.

Here my simple example with a second random value variable (again, it's more in my real data)

df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),class=letters[1:5])
value1 value2 class
  1    2.3     a
  2    3.6     b
  3    7.2     c
  4    5.6     d
  5    0.0     e

and my calculations are different for every condition. I understand I can simplify somewhat like this

df %>% 
mutate(result=NA,
     result=ifelse(class=="a",value1*1,result),
     result=ifelse(class=="b",value1/value2*4,result),
     result=ifelse(class=="c",value2*3.57,result),
     result=ifelse(class=="d",value1+value2*2,result),
     result=ifelse(class=="e",value2/value1/5,result))

A working solution similar to the above switch example would be even cleaner, though.

like image 685
user3460194 Avatar asked Jun 17 '15 16:06

user3460194


People also ask

How do I add a conditional column to a data frame?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.

How do I apply a condition in pandas?

Applying an IF condition in Pandas DataFrameIf the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'

What is Notna in Python?

Definition and Usage. The notna() method returns a DataFrame object where all the values are replaced with a Boolean value True for NOT NA (not-a -number) values, and otherwise False.

How do you find the absolute value of a data frame?

The abs() function is used to get a Series/DataFrame with absolute numeric value of each element. This function only applies to elements that are all numeric. Returns: Series/DataFrame containing the absolute value of each element.


2 Answers

No need to use ifelse here, You can use merge:

df <- data.frame(value=c(1:5),class=letters[1:5])
cond <- data.frame(ratio=c(1:5),class=letters[1:5])
transform(merge(df,cond),result=value*ratio)

  class value ratio result
1     a     1     1      1
2     b     2     2      4
3     c     3     3      9
4     d     4     4     16
5     e     5     5     25

After OP edit

It looks that the OP wants to apply a different function for each class. Here a data.table solution. I think it is simple and readable. First, I create function for each factor:

## here each function takes a data.table as an single argument
fns <- list(
  function(x) x[,value1]*1,
  function(x) x[,value1]/x[,value2]*4,
  function(x) x[,value2]*3.57,
  function(x) x[,value1]+x[,value2]*2,
  function(x) x[,value2]/x[,value1]/5
)
## create a names list here 
## the names here are just the class factors
fns <- setNames(fns,letters[1:5])

Applying the function by class is straightforward. I create the function name , and I use do.call to call a function by its name

## using data.table here for grouping feature
## .SD is the rest of columns except the grouping variable
## the code can also be written in dplyr or in base-R
library(data.table)
setDT(df)[,value:= fns[[class]](.SD),by=class]

     value1 value2 class     value
 1:      1    2.3     a  1.000000
 2:      2    3.6     b  2.222222
 3:      3    7.2     c 25.704000
 4:      4    5.6     d 15.200000
 5:      5    0.0     e  0.000000
 6:      1    2.3     a  1.000000
 7:      2    3.6     b  2.222222
 8:      3    7.2     c 25.704000
 9:      4    5.6     d 15.200000
10:      5    0.0     e  0.000000

I use this df:

df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),
                 class=rep(letters[1:5],2))
like image 185
agstudy Avatar answered Sep 30 '22 17:09

agstudy


As I mentioned in the comments, this question is more or less the same as this one (and you should read the answer there to catch up on what's going on below):

library(data.table)
dt = as.data.table(df) # or setDT to convert in place
dt[, class := as.character(class)] # simpler

# create a data.table with *functions* to match each class
fns = data.table(cls = letters[1:5], fn = list(quote(value1*1), quote(value1/value2*4), quote(value2*3.57), quote(value1+value2*2), quote(value2/value1/5)), key = 'cls')

# I have to jump through hoops here, due to a bug or two, see below
setkey(dt, class)
newvals = dt[, eval(fns[class]$fn[[1]], .SD), by = class]$V1
dt[, result := newvals][]
#   value1 value2 class    result
#1:      1    2.3     a  1.000000
#2:      2    3.6     b  2.222222
#3:      3    7.2     c 25.704000
#4:      4    5.6     d 15.200000
#5:      5    0.0     e  0.000000

Due to a few bugs in data.table the following, straightforward versions don't work yet:

dt[, result := eval(fns[class]$fn[[1]], .SD), by = class]

# or even better
dt[fns, result := eval(fn[[1]], .SD), by = .EACHI]

Bug reports have been filed.


I'm adding the suggestion in the comments from Frank below, as I think it's pretty cool and this way it's more likely to be preserved in SO. A more readable way of creating the function table is as follows:

quotem <- function(...) as.list(sys.call())[-1]

fnslist <- quotem(a = value1*1,
                  b = value1/value2*4,
                  c = value2*3.57,
                  d = value1+value2*2,
                  e = value2/value1/5)

fns = data.table(cls=names(fnslist),fn=fnslist,key="cls")
like image 32
eddi Avatar answered Sep 30 '22 15:09

eddi