duplicated
has a fromLast
argument. The "Example" section of ?duplicated
shows you how to use it. Just call duplicated
twice, once with fromLast=FALSE
and once with fromLast=TRUE
and take the rows where either are TRUE
.
Some late Edit: You didn't provide a reproducible example, so here's an illustration kindly contributed by @jbaums
vec <- c("a", "b", "c","c","c")
vec[duplicated(vec) | duplicated(vec, fromLast=TRUE)]
## [1] "c" "c" "c"
Edit: And an example for the case of a data frame:
df <- data.frame(rbind(c("a","a"),c("b","b"),c("c","c"),c("c","c")))
df[duplicated(df) | duplicated(df, fromLast=TRUE), ]
## X1 X2
## 3 c c
## 4 c c
You need to assemble the set of duplicated
values, apply unique
, and then test with %in%
. As always, a sample problem will make this process come alive.
> vec <- c("a", "b", "c","c","c")
> vec[ duplicated(vec)]
[1] "c" "c"
> unique(vec[ duplicated(vec)])
[1] "c"
> vec %in% unique(vec[ duplicated(vec)])
[1] FALSE FALSE TRUE TRUE TRUE
Duplicated rows in a dataframe could be obtained with dplyr
by doing
library(tidyverse)
df = bind_rows(iris, head(iris, 20)) # build some test data
df %>% group_by_all() %>% filter(n()>1) %>% ungroup()
To exclude certain columns group_by_at(vars(-var1, -var2))
could be used instead to group the data.
If the row indices and not just the data is actually needed, you could add them first as in:
df %>% add_rownames %>% group_by_at(vars(-rowname)) %>% filter(n()>1) %>% pull(rowname)
I've had the same question, and if I'm not mistaken, this is also an answer.
vec[col %in% vec[duplicated(vec$col),]$col]
Dunno which one is faster, though, the dataset I'm currently using isn't big enough to make tests which produce significant time gaps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With