Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Filter dataframe to only included rows where the column count meets a criteria

Assume this dataframe:

country <- c('USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'Canada')
number <- c(1:9)
df <- data.frame(country, number)

I want to be able to subset only the rows where the country count is greater than 4 or less than 2. So in this case, it would return:

country  number
USA      1
USA      2
USA      3
USA      4
USA      5
Canada   9

I am able to make it work with this:

totalcounts <- filter(count(df, country), n>4 | n<2) # giving me a df of the country and count
for (i in nrow(totalcounts)){
  # code in here that rbinds rows as it matches
}

But I feel there has to be an easier way. I haven't gotten the grasp of sapply and such yet, so I feel like I'm missing something here. It just seems like I am going the long way around and there is already something in place that does this.

like image 519
Nicko Avatar asked Dec 11 '25 07:12

Nicko


2 Answers

Here is a base R option using subset + ave

subset(df,!ave(number,country,FUN = function(x) length(x)%in% c(2:4)))

or a shorter version (Thank @Onyambu)

subset(df,!ave(number,country,FUN = length) %in% 2:4)

such that

  country number
1     USA      1
2     USA      2
3     USA      3
4     USA      4
5     USA      5
9  Canada      9
like image 176
ThomasIsCoding Avatar answered Dec 12 '25 20:12

ThomasIsCoding


Base R option using table :

tab <- table(df$country)
subset(df, country %in% names(tab[tab > 4 | tab < 2]))

#  country number
#1     USA      1
#2     USA      2
#3     USA      3
#4     USA      4
#5     USA      5
#9  Canada      9
like image 29
Ronak Shah Avatar answered Dec 12 '25 22:12

Ronak Shah



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!