I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).
More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.
I have tried doing this using both if_else
and case_when
but these functions require either a "false" argument (if_else
) or by default set values that are not matched to NA (case_when
):
iris_mutated <- iris %>%
dplyr::group_by(Species) %>%
dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))
iris_mutated <- iris %>%
dplyr::group_by(Species) %>%
dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))
Any ideas how to do this instead?
Edit:
Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):
Petal.Width Species
1 0.2 setosa
2 0.2 setosa
3 0.2 setosa
4 0.2 setosa
5 0.2 setosa
6 0.4 setosa
7 0.3 setosa
8 0.2 setosa
9 0.2 setosa
10 0.1 setosa
#...#
50 0.2 setosa
51 66.3 versicolor
52 66.3 versicolor
53 66.3 versicolor
#...#
100 66.3 versicolor
101 101.3 virginica
102 101.3 virginica
103 101.3 virginica
#...#
150 101.3 virginica
Group_by () function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by () function alone will not give any output.
So what is mutate? mutate () is one of the most useful dplyr verbs. You can use it to transform data (variables in your data.frame) and add it as a new variable into the data.frame. I tend to think of this much like adding a formula in Excel to calculate the value of a new column based on previous columns. You can do lots of things such as:
We’ll start by loading dplyr: The most important grouping verb is group_by (): it takes a data frame and one or more variables to group by: You can see the grouping when you print the data: Or use tally () to count the number of rows in each group. The sort argument is useful if you want to see the largest groups up front.
Much work with data involvces subsetting, defining new columns, sorting or otherwise manipulating the data. dplyr has five functions (verbs) for such actions, that all start with a data.frame or tbl_df and produce another one.
I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...
iris_mutated <- iris %>%
group_by(Species) %>%
mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
T ~ Sepal.Length),
Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
T ~ Sepal.Width),
Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
T ~ Petal.Length),
Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
T ~ Petal.Width))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With