Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group by, take count and filter out entries corresponding to count greater than 1 [duplicate]

Tags:

r

dplyr

The following is my data,

data

date             number     value
2016-05-05         1          5
2016-05-05         1          6
2016-05-06         2          7
2016-05-06         2          8
2016-05-07         3          9 
2016-05-08         4          10
2016-05-09         5          11

When I use the following command,

data %>% groupby(date, number) %>% summarize(count = n())

I get the following,

date             number        count 
2016-05-05         1             2
2016-05-06         2             2
2016-05-07         3             1
2016-05-08         4             1
2016-05-09         5             1

Now I want to filter out the entries corresponding to the counts greater than 1. I want to remove the combination entries which has count greater than 1. My output should be like the following,

data

date             number     value
2016-05-07         3          9 
2016-05-08         4          10
2016-05-09         5          11

where the first four entries, since it has count greater than 1 , has been filtered out. Can anybody help me in doing this? Or give some idea related to it?

like image 944
haimen Avatar asked Aug 01 '16 17:08

haimen


1 Answers

We can use filter after grouping by 'date', 'number' and check whether the number of rows (n()) is equal to 1 and keep those rows with the filter command.

library(dplyr)
data %>% 
     group_by(date, number) %>% 
     filter(n() ==1)
#        date number value
#        <chr>  <int> <int>
#1 2016-05-07      3     9
#2 2016-05-08      4    10
#3 2016-05-09      5    11

Just to provide some alternatives using data.table

library(data.table)
setDT(data)[, if(.N == 1) .SD , .(date, number)]

Or with base R

data[with(data, ave(number, number, date, FUN = length) ==1),]
like image 170
akrun Avatar answered Oct 06 '22 00:10

akrun