R - find and list duplicate rows based on two columns

Tags:

r

Using R. Base package, dplyr, or data.table are all okay for me to use. My data is ~1000 rows x 20 columns. I expect about 300 duplicates.

I'd like to do something like the following, but with one alteration:

Match/group duplicate rows (indices)

I'd like to find, not fully duplicated rows, but rows duplicated in two columns. For example, given this input table:

File     T.N     ID     Col1     Col2
BAI.txt   T      1       sdaf    eiri
BAJ.txt   N      2       fdd     fds
BBK.txt   T      1       ter     ase
BCD.txt   N      1       twe     ase

If I want to find duplicates in T.N & ID only, I'd end up with the following table:

File     T.N     ID     Col1     Col2
BAI.txt   T      1       sdaf    eiri
BBK.txt   T      1       ter     ase

576

asked Mar 11 '16 22:03

Gaius Augustus

3 Answers

I have found this to be an easy and useful method.

tr <- tribble(~File,     ~TN,     ~ID,    ~Col1,     ~Col2,
              'BAI.txt',   'T',      1,       'sdaf',    'eiri',
              'BAJ.txt',   'N',     2,      'fdd',     'fds',
              'BBK.txt',   'T',      1,       'ter',     'ase',
              'BCD.txt',   'N',      1,       'twe',     'ase')

group_by(tr, TN, ID) %>% 
  filter(n() > 1)

Output:

# A tibble: 2 x 5
# Groups:   TN, ID [1]
  File    TN       ID Col1  Col2 
  <chr>   <chr> <dbl> <chr> <chr>
1 BAI.txt T         1 sdaf  eiri 
2 BBK.txt T         1 ter   ase

answered Nov 09 '22 10:11

Robin

Here is an option using duplicated twice, second time along with fromLast = TRUE option because it returns TRUE only from the duplicate value on-wards

dupe = data[,c('T.N','ID')] # select columns to check duplicates
data[duplicated(dupe) | duplicated(dupe, fromLast=TRUE),]

#     File T.N ID Col1 Col2
#1 BAI.txt   T  1 sdaf eiri
#3 BBK.txt   T  1  ter  ase

answered Nov 09 '22 10:11

Veerendra Gadekar

A simple solution is find_duplicates from hablar

library(dplyr)
library(data.table)
library(hablar)

df <- fread("
  File     T.N     ID     Col1     Col2
  BAI.txt   T      1       sdaf    eiri
  BAJ.txt   N      2       fdd     fds
  BBK.txt   T      1       ter     ase
  BCD.txt   N      1       twe     ase
            ")

df %>% 
  find_duplicates(T.N, ID)

which returns the rows with duplicates in T.N and ID:

  File    T.N      ID Col1  Col2 
  <chr>   <chr> <int> <chr> <chr>
1 BAI.txt T         1 sdaf  eiri 
2 BBK.txt T         1 ter   ase

answered Nov 09 '22 10:11

davsjob

Related questions
                            
                                How to substitute NA by 0 in 20 columns?
                            
                                What is difference between geom_point and geom_jitter in simple language in R?
                            
                                How do I get ggplot to order facets correctly?
                            
                                Ternary plot and filled contour
                            
                                fastest way to get Min from every column in a matrix?
                            
                                How to combine do.call() plot() and expression()
                            
                                Creating folds for k-fold CV in R using Caret
                            
                                TermDocumentMatrix errors in R
                            
                                calculate row sum and product in data.frame
                            
                                Determine if string ends in whitespace and append a space if not
                            
                                How do I use tidyr to fill in completed rows within each value of a grouping variable?
                            
                                Suppress reader parse problems in r
                            
                                How to use layout() function in R?
                            
                                Label X Axis in Time Series Plot using R
                            
                                Splitting CamelCase in R
                            
                                Convert numbers to SI prefix
                            
                                How can I pass variables into an R markdown .Rmd file?
                            
                                merging multiple csv files in R using do.call
                            
                                How to read data with different separators?
                            
                                R-squared on test data

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R - find and list duplicate rows based on two columns

Tags:

r

Gaius Augustus

People also ask

3 Answers

Robin

Veerendra Gadekar

davsjob

Recent Activity

Donate For Us