I frequently have to calculate new variables from existing ones in a data frame based on a condition of a factor variable. Edit Getting 4 answers in like 2 mins, I realised I have oversimplified my example. Please see below. Simple example: <pre class="prettyprint"><code>df <- data.frame(value=c(1:5),class=letters[1:5]) df value class 1 a 2 b 3 c 4 d 5 e </code></pre> I can use such code <pre class="prettyprint"><code>df %>% mutate(result=NA) %>% mutate(result=ifelse(class=="a",value*1,result)) %>% mutate(result=ifelse(class=="b",value*2,result)) %>% mutate(result=ifelse(class=="c",value*3,result)) %>% mutate(result=ifelse(class=="d",value*4,result)) %>% mutate(result=ifelse(class=="e",value*5,result)) </code></pre> to perform conditional calculations on my variables, resulting in <pre class="prettyprint"><code>value class result 1 a 1 2 b 4 3 c 9 4 d 16 5 e 25 </code></pre> As in reality the number of classes is larger and the calculations are more complex, however, I would prefer something cleaner, like this <pre class="prettyprint"><code>df %>% mutate(results=switch(levels(class), "a"=value*1, "b"=value*2, "c"=value*3, "d"=value*4, "e"=value*5)) </code></pre> which obviously doesn't work <pre class="prettyprint"><code>Error in switch(levels(1:5), a = 1:5 * 1, b = 1:5 * 2, c = 1:5 * 3, d = 1:5 * : EXPR must be a length 1 vector </code></pre> Is there a way I can do this more nicely with dplyr piping (or else)? Edit In reality, I have more value variables to include in my calculations and they are not simple consecutive vectors, they are thousands of rows of measured data. Here my simple example with a second random value variable (again, it's more in my real data) <pre class="prettyprint"><code>df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),class=letters[1:5]) value1 value2 class 1 2.3 a 2 3.6 b 3 7.2 c 4 5.6 d 5 0.0 e </code></pre> and my calculations are different for every condition. I understand I can simplify somewhat like this <pre class="prettyprint"><code>df %>% mutate(result=NA, result=ifelse(class=="a",value1*1,result), result=ifelse(class=="b",value1/value2*4,result), result=ifelse(class=="c",value2*3.57,result), result=ifelse(class=="d",value1+value2*2,result), result=ifelse(class=="e",value2/value1/5,result)) </code></pre> A working solution similar to the above switch example would be even cleaner, though.

As I mentioned in the comments, this question is more or less the same as this one (and you should read the answer there to catch up on what's going on below): <pre class="prettyprint"><code>library(data.table) dt = as.data.table(df) # or setDT to convert in place dt[, class := as.character(class)] # simpler # create a data.table with *functions* to match each class fns = data.table(cls = letters[1:5], fn = list(quote(value1*1), quote(value1/value2*4), quote(value2*3.57), quote(value1+value2*2), quote(value2/value1/5)), key = 'cls') # I have to jump through hoops here, due to a bug or two, see below setkey(dt, class) newvals = dt[, eval(fns[class]$fn[[1]], .SD), by = class]$V1 dt[, result := newvals][] # value1 value2 class result #1: 1 2.3 a 1.000000 #2: 2 3.6 b 2.222222 #3: 3 7.2 c 25.704000 #4: 4 5.6 d 15.200000 #5: 5 0.0 e 0.000000 </code></pre> Due to a few bugs in <code>data.table</code> the following, straightforward versions don't work yet: <pre class="prettyprint"><code>dt[, result := eval(fns[class]$fn[[1]], .SD), by = class] # or even better dt[fns, result := eval(fn[[1]], .SD), by = .EACHI] </code></pre> Bug reports have been filed. <hr> I'm adding the suggestion in the comments from Frank below, as I think it's pretty cool and this way it's more likely to be preserved in SO. A more readable way of creating the function table is as follows: <pre class="prettyprint"><code>quotem <- function(...) as.list(sys.call())[-1] fnslist <- quotem(a = value1*1, b = value1/value2*4, c = value2*3.57, d = value1+value2*2, e = value2/value1/5) fns = data.table(cls=names(fnslist),fn=fnslist,key="cls") </code></pre>

conditional calculations in data frame

Tags:

r

dplyr

I frequently have to calculate new variables from existing ones in a data frame based on a condition of a factor variable.

Edit Getting 4 answers in like 2 mins, I realised I have oversimplified my example. Please see below.

Simple example:

df <- data.frame(value=c(1:5),class=letters[1:5])
df
value class
1     a
2     b
3     c
4     d
5     e

I can use such code

df %>% 
    mutate(result=NA) %>%
    mutate(result=ifelse(class=="a",value*1,result)) %>%
    mutate(result=ifelse(class=="b",value*2,result)) %>%
    mutate(result=ifelse(class=="c",value*3,result)) %>%
    mutate(result=ifelse(class=="d",value*4,result)) %>%
    mutate(result=ifelse(class=="e",value*5,result))

to perform conditional calculations on my variables, resulting in

value class result
 1     a      1
 2     b      4
 3     c      9
 4     d     16
 5     e     25

As in reality the number of classes is larger and the calculations are more complex, however, I would prefer something cleaner, like this

df %>%
mutate(results=switch(levels(class),
                    "a"=value*1,
                    "b"=value*2,
                    "c"=value*3,
                    "d"=value*4,
                    "e"=value*5))

which obviously doesn't work

Error in switch(levels(1:5), a = 1:5 * 1, b = 1:5 * 2, c = 1:5 * 3, d =  1:5 *  : 
  EXPR must be a length 1 vector

Is there a way I can do this more nicely with dplyr piping (or else)?

Edit In reality, I have more value variables to include in my calculations and they are not simple consecutive vectors, they are thousands of rows of measured data.

Here my simple example with a second random value variable (again, it's more in my real data)

df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),class=letters[1:5])
value1 value2 class
  1    2.3     a
  2    3.6     b
  3    7.2     c
  4    5.6     d
  5    0.0     e

and my calculations are different for every condition. I understand I can simplify somewhat like this

df %>% 
mutate(result=NA,
     result=ifelse(class=="a",value1*1,result),
     result=ifelse(class=="b",value1/value2*4,result),
     result=ifelse(class=="c",value2*3.57,result),
     result=ifelse(class=="d",value1+value2*2,result),
     result=ifelse(class=="e",value2/value1/5,result))

A working solution similar to the above switch example would be even cleaner, though.

685

asked Jun 17 '15 16:06

user3460194

2 Answers

No need to use ifelse here, You can use merge:

df <- data.frame(value=c(1:5),class=letters[1:5])
cond <- data.frame(ratio=c(1:5),class=letters[1:5])
transform(merge(df,cond),result=value*ratio)

  class value ratio result
1     a     1     1      1
2     b     2     2      4
3     c     3     3      9
4     d     4     4     16
5     e     5     5     25

After OP edit

It looks that the OP wants to apply a different function for each class. Here a data.table solution. I think it is simple and readable. First, I create function for each factor:

## here each function takes a data.table as an single argument
fns <- list(
  function(x) x[,value1]*1,
  function(x) x[,value1]/x[,value2]*4,
  function(x) x[,value2]*3.57,
  function(x) x[,value1]+x[,value2]*2,
  function(x) x[,value2]/x[,value1]/5
)
## create a names list here 
## the names here are just the class factors
fns <- setNames(fns,letters[1:5])

Applying the function by class is straightforward. I create the function name , and I use do.call to call a function by its name

## using data.table here for grouping feature
## .SD is the rest of columns except the grouping variable
## the code can also be written in dplyr or in base-R
library(data.table)
setDT(df)[,value:= fns[[class]](.SD),by=class]

     value1 value2 class     value
 1:      1    2.3     a  1.000000
 2:      2    3.6     b  2.222222
 3:      3    7.2     c 25.704000
 4:      4    5.6     d 15.200000
 5:      5    0.0     e  0.000000
 6:      1    2.3     a  1.000000
 7:      2    3.6     b  2.222222
 8:      3    7.2     c 25.704000
 9:      4    5.6     d 15.200000
10:      5    0.0     e  0.000000

I use this df:

df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),
                 class=rep(letters[1:5],2))

185

answered Sep 30 '22 17:09

agstudy

As I mentioned in the comments, this question is more or less the same as this one (and you should read the answer there to catch up on what's going on below):

library(data.table)
dt = as.data.table(df) # or setDT to convert in place
dt[, class := as.character(class)] # simpler

# create a data.table with *functions* to match each class
fns = data.table(cls = letters[1:5], fn = list(quote(value1*1), quote(value1/value2*4), quote(value2*3.57), quote(value1+value2*2), quote(value2/value1/5)), key = 'cls')

# I have to jump through hoops here, due to a bug or two, see below
setkey(dt, class)
newvals = dt[, eval(fns[class]$fn[[1]], .SD), by = class]$V1
dt[, result := newvals][]
#   value1 value2 class    result
#1:      1    2.3     a  1.000000
#2:      2    3.6     b  2.222222
#3:      3    7.2     c 25.704000
#4:      4    5.6     d 15.200000
#5:      5    0.0     e  0.000000

Due to a few bugs in data.table the following, straightforward versions don't work yet:

dt[, result := eval(fns[class]$fn[[1]], .SD), by = class]

# or even better
dt[fns, result := eval(fn[[1]], .SD), by = .EACHI]

Bug reports have been filed.

I'm adding the suggestion in the comments from Frank below, as I think it's pretty cool and this way it's more likely to be preserved in SO. A more readable way of creating the function table is as follows:

quotem <- function(...) as.list(sys.call())[-1]

fnslist <- quotem(a = value1*1,
                  b = value1/value2*4,
                  c = value2*3.57,
                  d = value1+value2*2,
                  e = value2/value1/5)

fns = data.table(cls=names(fnslist),fn=fnslist,key="cls")

answered Sep 30 '22 15:09

eddi

Related questions
                            
                                "Not Join" in R
                            
                                Melt a dataframe by pattern in colnames
                            
                                Creating A Deck Of Cards In R Without Using While And Double For Loop
                            
                                R within Rstudio cannot find rmarkdown package
                            
                                shiny, ggvis, and add_tooltip with HTML
                            
                                R remove rows from panel while keeping the panel balanced
                            
                                Loading Rcpp and Running Sample Code
                            
                                How to get an expression(cos (alpha)) into a labels?
                            
                                Remove values outside Loess curve limits
                            
                                Add colored arrow to axis of ggplot2 (partially outside plot region)
                            
                                The explanation of the verbose mode during running randomForest in R
                            
                                Calculate row-wise matrix cumsum from vector
                            
                                r - select last n occurrences for each group
                            
                                Meaning of %o% in R
                            
                                Is S-PLUS dead? [closed]
                            
                                Display SpatialPolygonsDataFrame on leaflet map with R
                            
                                R as.POSIXct() dropping hours minutes and seconds
                            
                                Evaluate a Chunk based on the output format of knitr
                            
                                Proper way to have two functions access a single function's environment?
                            
                                Deleting rows in R conditionally

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With