Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr to group_by and conditionally mutate only with if (without else) statement

Tags:

r

dplyr

I have a dataframe that I need to group by a combination of columns entries in order to conditionally mutate several columns using only an if statement (without an else condition).

More specifically, I want to sum up the column values of a certain group if they cross a pre-defined threshold, otherwise the values should remain unchanged.

I have tried doing this using both if_else and case_when but these functions require either a "false" argument (if_else) or by default set values that are not matched to NA (case_when):

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=if_else(sum(Sepal.Length)>250, sum(Sepal.Length)),
                Sepal.Width=if_else(sum(Sepal.Width)>170, sum(Sepal.Width)),
                Petal.Length=if_else(sum(Petal.Length)>70, sum(Petal.Length)),
                Petal.Width=if_else(sum(Petal.Width)>15, sum(Petal.Width)))

iris_mutated <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::mutate(Sepal.Length=case_when(sum(Sepal.Length)>250 ~ sum(Sepal.Length)),
                Sepal.Width=case_when(sum(Sepal.Width)>170 ~ sum(Sepal.Width)),
                Petal.Length=case_when(sum(Petal.Length)>70 ~ sum(Petal.Length)),
                Petal.Width=case_when(sum(Petal.Width)>15 ~ sum(Petal.Width)))

Any ideas how to do this instead?

Edit:

Here is an example for the expected output. The sum of the petal width for all species-wise grouped entries is 12.3 for setosa, 101.3 for virginica and 66.3 for versicolor. If I require that this sum should be at least 15 for the values to be summed up (otherwise the original value should be kept), then I expect the following output (only showing the columns "Petal.Width" and "Species"):

Petal.Width    Species
1           0.2     setosa
2           0.2     setosa
3           0.2     setosa
4           0.2     setosa
5           0.2     setosa
6           0.4     setosa
7           0.3     setosa
8           0.2     setosa
9           0.2     setosa
10          0.1     setosa
#...#
50          0.2     setosa
51          66.3 versicolor
52          66.3 versicolor
53          66.3 versicolor
#...#
100         66.3 versicolor
101         101.3  virginica
102         101.3  virginica
103         101.3  virginica
#...#
150         101.3  virginica
like image 376
atreju Avatar asked Jan 28 '19 14:01

atreju


People also ask

What is the use of dplyr group_by () in R?

Group_by () function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by () function alone will not give any output.

What is mutate in dplyr?

So what is mutate? mutate () is one of the most useful dplyr verbs. You can use it to transform data (variables in your data.frame) and add it as a new variable into the data.frame. I tend to think of this much like adding a formula in Excel to calculate the value of a new column based on previous columns. You can do lots of things such as:

How do I Group data in Python dplyr?

We’ll start by loading dplyr: The most important grouping verb is group_by (): it takes a data frame and one or more variables to group by: You can see the grouping when you print the data: Or use tally () to count the number of rows in each group. The sort argument is useful if you want to see the largest groups up front.

How does dplyr work with data?

Much work with data involvces subsetting, defining new columns, sorting or otherwise manipulating the data. dplyr has five functions (verbs) for such actions, that all start with a data.frame or tbl_df and produce another one.


1 Answers

I think you are after this? Using Johnny's method. You shouldn't hit an error when you use the original value as part of case_when in the case when the sum is not greater than the cutoff...

iris_mutated <- iris %>% 
  group_by(Species) %>% 
  mutate(Sepal.Length = case_when(sum(Sepal.Length) > 250 ~ sum(Sepal.Length),
                                   T ~ Sepal.Length),
         Sepal.Width = case_when(sum(Sepal.Width) > 170 ~ sum(Sepal.Width),
                                   T ~ Sepal.Width),
         Petal.Length = case_when(sum(Petal.Length) > 70 ~ sum(Petal.Length),
                                   T ~ Petal.Length),
         Petal.Width = case_when(sum(Petal.Width) > 15 ~ sum(Petal.Width),
                                   T ~ Petal.Width))
like image 101
Tom Haddow Avatar answered Oct 16 '22 02:10

Tom Haddow