Find indices of duplicated rows [duplicate]

Tags:

Function duplicated in R performs duplicate row search. If we want to remove the duplicates, we need just to write df[!duplicated(df),] and duplicates will be removed from data frame.

But how to find the indices of duplicated data? If duplicated returns TRUE on some row, it means, that this is the second occurence of such a row in the data frame and its index can be easily obtained. How to obtain the index of first occurence of this row? Or, in other words, an index with which the duplicated row is identical?

I could make a loop on data.frame, but I think there is a more elegant answer on this question.

967

asked Sep 19 '12 13:09

annndrey

1 Answers

Here's an example:

df <- data.frame(a = c(1,2,3,4,1,5,6,4,2,1))  duplicated(df) | duplicated(df, fromLast = TRUE) #[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

How it works?

The function duplicated(df) determines duplicate elements in the original data. The fromLast = TRUE indicates that "duplication should be considered from the reverse side". The two resulting logical vectors are combined using | since a TRUE in at least one of them indicates a duplicated value.

135

answered Oct 15 '22 15:10

Sven Hohenstein

Related questions
                            
                                Get dplyr count of distinct in a readable way
                            
                                How to use random forests in R with missing values?
                            
                                Create a Vector of All Days Between Two Dates
                            
                                Can dplyr summarise over several variables without listing each one? [duplicate]
                            
                                Add count of unique / distinct values by group to the original data
                            
                                RMarkdown: How to change the font color?
                            
                                Finding local maxima and minima
                            
                                Replacing character values with NA in a data frame
                            
                                Create a numeric vector with names in one statement?
                            
                                General guide for creating publication quality tables using R, Sweave, and LaTeX
                            
                                What you can do with a data.frame that you can't with a data.table?
                            
                                Error: could not find function "unit"
                            
                                Subscript letters in ggplot axis label
                            
                                What is the difference between gc() and rm()
                            
                                R Reading in a zip data file without unzipping it
                            
                                Why use as.factor() instead of just factor()
                            
                                Advantages of reactive vs. observe vs. observeEvent
                            
                                Access a URL and read Data with R
                            
                                What is difference between dataframe and list in R?
                            
                                How to flatten a list of lists?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find indices of duplicated rows [duplicate]

Tags:

dataframe

r

duplicates

annndrey

People also ask

1 Answers

How it works?

Sven Hohenstein

Recent Activity

Donate For Us