Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if characters are all equal in a group using dplyr - R

Tags:

dataframe

r

dplyr

In the following data frame, how can I group by the first two columns and check if all the values in the fourth column are identical? If they are identical I would like to replace them with ''.

In this example, the group combinations 'embryonated + protein' and 'Hatching + Lipid' are the only two groups whose letters are not all a.

df

         Stage variable Temperature letters       Mean
30 Embryonated Moisture          30       a  808.70882
31 Embryonated      NFE          20       a   53.28806
32 Embryonated      NFE          25       a   45.38572
33 Embryonated      NFE          30       a   84.56113
34 Embryonated  Protein          20      ab  118.53608
35 Embryonated  Protein          25       b  127.29849
36 Embryonated  Protein          30       a   84.55175
37    Hatching      Ash          20       a   16.95345
38    Hatching      Ash          25       a   14.54980
39    Hatching      Ash          30       a   13.38510
40    Hatching   Energy          20       a 4931.18857
41    Hatching   Energy          25       a 4187.27213
42    Hatching   Energy          30       a 4314.61171
43    Hatching    Lipid          20       b   26.44363
44    Hatching    Lipid          25       a   19.90928
45    Hatching    Lipid          30      ab   22.27561
46    Hatching Moisture          20       a  785.63062
47    Hatching Moisture          25       a  818.69860
48    Hatching Moisture          30       a  815.32070
49    Hatching      NFE          20       a   60.34359
50    Hatching      NFE          25       a   43.02979

I have tried using dplyr to no avail.

grp_cols <- names(df)[c(1,2)] #group by stage and variable

# Convert character vector to list of symbols
dots <- lapply(grp_cols3, as.symbol)


res = df %>% group_by(.dots=dots) %>% 
  do(k=all(letters=='a')) #(returns all groups as `FALSE`)

Data:

dput(df)

structure(list(Stage = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Developing", 
"Embryonated", "Hatching", "Laid"), class = "factor"), variable = structure(c(1L, 
5L, 5L, 5L, 2L, 2L, 2L, 4L, 4L, 4L, 6L, 6L, 6L, 3L, 3L, 3L, 1L, 
1L, 1L, 5L, 5L), .Label = c("Moisture", "Protein", "Lipid", "Ash", 
"NFE", "Energy"), class = "factor"), Temperature = c("30", "20", 
"25", "30", "20", "25", "30", "20", "25", "30", "20", "25", "30", 
"20", "25", "30", "20", "25", "30", "20", "25"), letters = c("a", 
"a", "a", "a", "ab", "b", "a", "a", "a", "a", "a", "a", "a", 
"b", "a", "ab", "a", "a", "a", "a", "a"), Mean = c(808.708818349727, 
53.2880626188374, 45.3857220182952, 84.5611267892406, 118.536080769588, 
127.298486932385, 84.5517498179938, 16.9534468121571, 14.5497954869813, 
13.3850951354759, 4931.18857123979, 4187.27213494545, 4314.61171127083, 
26.4436265667305, 19.9092762683653, 22.2756088142943, 785.630624024365, 
818.698598619779, 815.320702070777, 60.3435858953567, 43.0297881562102
)), .Names = c("Stage", "variable", "Temperature", "letters", 
"Mean"), row.names = 30:50, class = "data.frame")
like image 224
J.Con Avatar asked May 04 '18 04:05

J.Con


People also ask

How do you check if all columns are equal in R?

To check for equality of three columns by row, we can use logical comparison of equality with double equal sign (==) and & operator.

How do I count by R in a group?

Group By Count in R using dplyr You can use group_by() function along with the summarise() from dplyr package to find the group by count in R DataFrame, group_by() returns the grouped_df ( A grouped Data Frame) and use summarise() on grouped df to get the group by count.

What is group_by function in R?

The group_by() function in R is from dplyr package that is used to group rows by column values in the DataFrame, It is similar to GROUP BY clause in SQL. R dplyr groupby is used to collect identical data into groups on DataFrame and perform aggregate functions on the grouped data.

Is dplyr a Tidyr?

Similarly to readr , dplyr and tidyr are also part of the tidyverse. These packages were loaded in R's memory when we called library(tidyverse) earlier.


1 Answers

Split the data by each group, look for the n_distinct values, then replace with '' where this is the case:

df %>%
  group_by(Stage,variable) %>%
  mutate(letters = replace(letters, n_distinct(letters)==1, '') )

Similar logic works in data.table too:

library(data.table)
setDT(df)
df[, letters := if(uniqueN(letters)==1) '' else letters, by=.(Stage,variable)]
like image 195
thelatemail Avatar answered Oct 22 '22 02:10

thelatemail