Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter group of rows based on sum of values from different column

Tags:

r

dplyr

I'm trying to filter out whole rows in R, but only if the frequencies for a particular set don't add up to more than 5.

The data I have looks a bit like this. It's a dataframe that I'm currently calling "Words":

HEADWORD VARIANT FREQUENCY
 SWORD    sword      2
 SWORD    swerd      1
 SWORD    sworde     1
 KNIGHT   knight     6
 KNIGHT   kniht      2
 KNIGHT   knyt       1

I only want rows for which the frequencies within a particular headword add up to more than 5. So here, I want to keep all the instances of KNIGHT but I want to get rid of all the SWORD rows entirely.

I tried to do this on dplyr, but with no success. This is the code I tried:

Words1 %>% group_by(HW) %>%  filter(Fr > 5)
like image 421
Rose Avatar asked Oct 12 '25 18:10

Rose


2 Answers

We need to get the sum of 'FREQUENCY' and check whether it is greater than 5 in the filter after grouping by 'HEADWORD'

Words1 %>% 
     group_by(HEADWORD) %>% 
     filter(sum(FREQUENCY) >5)   
#   HEADWORD VARIANT FREQUENCY
#     <chr>   <chr>     <int>
#1   KNIGHT  knight         6
#2   KNIGHT   kniht         2 
#3   KNIGHT    knyt         1
like image 89
akrun Avatar answered Oct 14 '25 11:10

akrun


You can use base R ave function

df[ave(df$FREQUENCY, df$HEADWORD, FUN = sum) > 5, ]

#   HEADWORD VARIANT FREQUENCY
#4   KNIGHT  knight         6
#5   KNIGHT   kniht         2
#6   KNIGHT    knyt         1
like image 40
Ronak Shah Avatar answered Oct 14 '25 12:10

Ronak Shah