Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find indices of duplicated rows [duplicate]

Function duplicated in R performs duplicate row search. If we want to remove the duplicates, we need just to write df[!duplicated(df),] and duplicates will be removed from data frame.

But how to find the indices of duplicated data? If duplicated returns TRUE on some row, it means, that this is the second occurence of such a row in the data frame and its index can be easily obtained. How to obtain the index of first occurence of this row? Or, in other words, an index with which the duplicated row is identical?

I could make a loop on data.frame, but I think there is a more elegant answer on this question.

like image 967
annndrey Avatar asked Sep 19 '12 13:09

annndrey


People also ask

Can pandas duplicate indexes?

Indicate duplicate index values. Duplicated values are indicated as True values in the resulting array. Either all duplicates, all except the first, or all except the last occurrence of duplicates can be indicated.


1 Answers

Here's an example:

df <- data.frame(a = c(1,2,3,4,1,5,6,4,2,1))  duplicated(df) | duplicated(df, fromLast = TRUE) #[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE 

How it works?

The function duplicated(df) determines duplicate elements in the original data. The fromLast = TRUE indicates that "duplication should be considered from the reverse side". The two resulting logical vectors are combined using | since a TRUE in at least one of them indicates a duplicated value.

like image 135
Sven Hohenstein Avatar answered Oct 15 '22 15:10

Sven Hohenstein