Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find duplicated rows (based on 2 columns) in Data Frame in R

I have a data frame in R which looks like:

| RIC    | Date                | Open   | |--------|---------------------|--------| | S1A.PA | 2011-06-30 20:00:00 | 23.7   | | ABC.PA | 2011-07-03 20:00:00 | 24.31  | | EFG.PA | 2011-07-04 20:00:00 | 24.495 | | S1A.PA | 2011-07-05 20:00:00 | 24.23  | 

I want to know if there's any duplicates regarding to the combination of RIC and Date. Is there a function for that in R?

like image 665
user802231 Avatar asked Aug 08 '11 18:08

user802231


People also ask

How do I find duplicate rows in a DataFrame in R?

We can find the rows with duplicated values in a particular column of an R data frame by using duplicated function inside the subset function. This will return only the duplicate rows based on the column we choose that means the first unique value will not be in the output.

How do you find common values in two columns in R?

To find the common elements between two columns of an R data frame, we can use intersect function.


2 Answers

You can always try simply passing those first two columns to the function duplicated:

duplicated(dat[,1:2]) 

assuming your data frame is called dat. For more information, we can consult the help files for the duplicated function by typing ?duplicated at the console. This will provide the following sentences:

Determines which elements of a vector or data frame are duplicates of elements with smaller subscripts, and returns a logical vector indicating which elements (rows) are duplicates.

So duplicated returns a logical vector, which we can then use to extract a subset of dat:

ind <- duplicated(dat[,1:2]) dat[ind,] 

or you can skip the separate assignment step and simply use:

dat[duplicated(dat[,1:2]),] 
like image 156
joran Avatar answered Oct 06 '22 06:10

joran


dplyr is so much nicer for this sort of thing:

library(dplyr) yourDataFrame %>%     distinct(RIC, Date, .keep_all = TRUE) 

(the ".keep_all is optional. if not used, it will return only the deduped 2 columns. when used, it returns the deduped whole data frame)

like image 41
Guy Manova Avatar answered Oct 06 '22 06:10

Guy Manova