I frequently have to calculate new variables from existing ones in a data frame based on a condition of a factor variable.
Edit Getting 4 answers in like 2 mins, I realised I have oversimplified my example. Please see below.
Simple example:
df <- data.frame(value=c(1:5),class=letters[1:5])
df
value class
1 a
2 b
3 c
4 d
5 e
I can use such code
df %>%
mutate(result=NA) %>%
mutate(result=ifelse(class=="a",value*1,result)) %>%
mutate(result=ifelse(class=="b",value*2,result)) %>%
mutate(result=ifelse(class=="c",value*3,result)) %>%
mutate(result=ifelse(class=="d",value*4,result)) %>%
mutate(result=ifelse(class=="e",value*5,result))
to perform conditional calculations on my variables, resulting in
value class result
1 a 1
2 b 4
3 c 9
4 d 16
5 e 25
As in reality the number of classes is larger and the calculations are more complex, however, I would prefer something cleaner, like this
df %>%
mutate(results=switch(levels(class),
"a"=value*1,
"b"=value*2,
"c"=value*3,
"d"=value*4,
"e"=value*5))
which obviously doesn't work
Error in switch(levels(1:5), a = 1:5 * 1, b = 1:5 * 2, c = 1:5 * 3, d = 1:5 * :
EXPR must be a length 1 vector
Is there a way I can do this more nicely with dplyr piping (or else)?
Edit In reality, I have more value variables to include in my calculations and they are not simple consecutive vectors, they are thousands of rows of measured data.
Here my simple example with a second random value variable (again, it's more in my real data)
df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),class=letters[1:5])
value1 value2 class
1 2.3 a
2 3.6 b
3 7.2 c
4 5.6 d
5 0.0 e
and my calculations are different for every condition. I understand I can simplify somewhat like this
df %>%
mutate(result=NA,
result=ifelse(class=="a",value1*1,result),
result=ifelse(class=="b",value1/value2*4,result),
result=ifelse(class=="c",value2*3.57,result),
result=ifelse(class=="d",value1+value2*2,result),
result=ifelse(class=="e",value2/value1/5,result))
A working solution similar to the above switch example would be even cleaner, though.
You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.
Applying an IF condition in Pandas DataFrameIf the number is equal or lower than 4, then assign the value of 'True' Otherwise, if the number is greater than 4, then assign the value of 'False'
Definition and Usage. The notna() method returns a DataFrame object where all the values are replaced with a Boolean value True for NOT NA (not-a -number) values, and otherwise False.
The abs() function is used to get a Series/DataFrame with absolute numeric value of each element. This function only applies to elements that are all numeric. Returns: Series/DataFrame containing the absolute value of each element.
No need to use ifelse
here, You can use merge
:
df <- data.frame(value=c(1:5),class=letters[1:5])
cond <- data.frame(ratio=c(1:5),class=letters[1:5])
transform(merge(df,cond),result=value*ratio)
class value ratio result
1 a 1 1 1
2 b 2 2 4
3 c 3 3 9
4 d 4 4 16
5 e 5 5 25
It looks that the OP wants to apply a different function for each class. Here a data.table solution. I think it is simple and readable. First, I create function for each factor:
## here each function takes a data.table as an single argument
fns <- list(
function(x) x[,value1]*1,
function(x) x[,value1]/x[,value2]*4,
function(x) x[,value2]*3.57,
function(x) x[,value1]+x[,value2]*2,
function(x) x[,value2]/x[,value1]/5
)
## create a names list here
## the names here are just the class factors
fns <- setNames(fns,letters[1:5])
Applying the function by class is straightforward. I create the function name , and I use do.call
to call a function by its name
## using data.table here for grouping feature
## .SD is the rest of columns except the grouping variable
## the code can also be written in dplyr or in base-R
library(data.table)
setDT(df)[,value:= fns[[class]](.SD),by=class]
value1 value2 class value
1: 1 2.3 a 1.000000
2: 2 3.6 b 2.222222
3: 3 7.2 c 25.704000
4: 4 5.6 d 15.200000
5: 5 0.0 e 0.000000
6: 1 2.3 a 1.000000
7: 2 3.6 b 2.222222
8: 3 7.2 c 25.704000
9: 4 5.6 d 15.200000
10: 5 0.0 e 0.000000
I use this df:
df <- data.frame(value1=c(1:5),value2=c(2.3,3.6,7.2,5.6,0),
class=rep(letters[1:5],2))
As I mentioned in the comments, this question is more or less the same as this one (and you should read the answer there to catch up on what's going on below):
library(data.table)
dt = as.data.table(df) # or setDT to convert in place
dt[, class := as.character(class)] # simpler
# create a data.table with *functions* to match each class
fns = data.table(cls = letters[1:5], fn = list(quote(value1*1), quote(value1/value2*4), quote(value2*3.57), quote(value1+value2*2), quote(value2/value1/5)), key = 'cls')
# I have to jump through hoops here, due to a bug or two, see below
setkey(dt, class)
newvals = dt[, eval(fns[class]$fn[[1]], .SD), by = class]$V1
dt[, result := newvals][]
# value1 value2 class result
#1: 1 2.3 a 1.000000
#2: 2 3.6 b 2.222222
#3: 3 7.2 c 25.704000
#4: 4 5.6 d 15.200000
#5: 5 0.0 e 0.000000
Due to a few bugs in data.table
the following, straightforward versions don't work yet:
dt[, result := eval(fns[class]$fn[[1]], .SD), by = class]
# or even better
dt[fns, result := eval(fn[[1]], .SD), by = .EACHI]
Bug reports have been filed.
I'm adding the suggestion in the comments from Frank below, as I think it's pretty cool and this way it's more likely to be preserved in SO. A more readable way of creating the function table is as follows:
quotem <- function(...) as.list(sys.call())[-1]
fnslist <- quotem(a = value1*1,
b = value1/value2*4,
c = value2*3.57,
d = value1+value2*2,
e = value2/value1/5)
fns = data.table(cls=names(fnslist),fn=fnslist,key="cls")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With