I have a data frame (df) below and I want to add an additional column, result
, using dplyr that will take on the value 1 if z == "gone"
and where x
is the maximum value for group y
.
y x z
1 a 3 gone
2 a 5 gone
3 a 8 gone
4 a 9 gone
5 a 10 gone
6 b 1
7 b 2
8 b 4
9 b 6
10 b 7
If I were to simply select the maximum for each group it would be:
df %>%
group_by(y) %>%
slice(which.max(x))
which will return:
y x z
1 a 10 gone
2 b 7
This is not what I want. I need to take advantage of the max value of x
for each group in y
while checking to see if z == "gone"
, and if TRUE
1 otherwise 0. This would look like:
y x z result
1 a 3 gone 0
2 a 5 gone 0
3 a 8 gone 0
4 a 9 gone 0
5 a 10 gone 1
6 b 1 0
7 b 2 0
8 b 4 0
9 b 6 0
10 b 7 0
I'm assuming I would use a conditional statement within mutate()
but I cannot seem to find an example. Please advise.
With dplyr
you can use:
df %>% group_by(y) %>% mutate(result = +(x == max(x) & z == 'gone'))
The +(..)
notation is shorthand for as.integer
to coerce the logical output to 1's and 0's. Some don't like it so it's a matter of shorter code versus readability. Efficiency gains can be debated on the circumstance.
Also to appreciate what data.table
and dplyr
have done for data manipulation with R, let's do the same thing in the old-fashioned "split-apply-combine" way:
#split data.frame by group
split.df <- split(df, df$y)
#apply required function to each group
lst <- lapply(split.df, function(dfx) {
dfx$result <- +(dfx$x == max(dfx$x) & dfx$z == "gone")
dfx})
#combine result in new data.frame
newdf <- do.call(rbind, lst)
We can do this with data.table
. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'y', we create the logical condition for maximum value of 'x' and the 'gone' element in 'z', coerce it to 'integer' (as.integer
) and assign (:=
) the output to the new column ('result').
library(data.table)
setDT(df)[, result := as.integer(x==max(x) & z=='gone') , by = y]
df
# y x z result
# 1: a 3 gone 0
# 2: a 5 gone 0
# 3: a 8 gone 0
# 4: a 9 gone 0
# 5: a 10 gone 1
# 6: b 1 0
# 7: b 2 0
# 8: b 4 0
# 9: b 6 0
#10: b 7 0
Or we can use ave
from base R
df$result <- with(df, +(ave(x, y, FUN=max)==x & z=='gone' ))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With