I have a dataframe with one observation per row and two observations per subject. I'd like to filter out just the rows with duplicate 'day' numbers.
ex <- data.frame('id'= rep(1:5,2), 'day'= c(1:5, 1:3,5:6))
The following code filters out just the second duplicated row, but not the first. Again, I'd like to filter out both of the duplicated rows.
ex %>%
group_by(id) %>%
filter(duplicated(day))
The following code works, but seems clunky. Does anyone have a more efficient solution?
ex %>%
group_by(id) %>%
filter(duplicated(day, fromLast = TRUE) | duplicated(day, fromLast = FALSE))
Remove Duplicate rows in R using Dplyr – distinct () function. Distinct function in R is used to remove duplicate rows in R using Dplyr package. Dplyr package in R is provided with distinct() function which eliminate duplicates rows with single variable or with multiple variable.
To remove duplicates from the data frame in R, use the duplicated () function and pass the column name as a parameter and use the ! outside the duplicated () function, which returns the unique rows of the data frame. It completely removes the row from the data frame having duplicate values.
ex %>% group_by (id) %>% filter (duplicated (day, fromLast = TRUE) | duplicated (day, fromLast = FALSE)) duplicated can be applied on the whole dataset and this can be done with just base R methods. Using dplyr, we can group_by both the columns and filter only when the number of rows ( n ()) is greater than 1.
Find the duplicate elements in the R data frame. To remove duplicates from the data frame in R, use the duplicated () function and pass the column name as a parameter and use the ! outside the duplicated () function, which returns the unique rows of the data frame.
The R function duplicated() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.
Single tidyverse pipe:
exSinglesOnly <-
ex %>%
group_by(id,day) %>% # the complete group of interest
mutate(duplicate = n()) %>% # count number in each group
filter(duplicate == 1) %>% # select only unique records
select(-duplicate) # remove group count column
> exSinglesOnly Source: local data frame [4 x 2] Groups: id, day [4] id day <int> <int> 1 4 4 2 5 5 3 4 5 4 5 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With