Check if each row of a data frame is contained in another data frame

Tags:

dataframe

r

I wrote the following function, it works. However it is very slow when df1 has 1700 rows, and df2 has 70000 rows. Is there anyway to improve the efficiency?

rowcheck <- function(df1, df2){
         apply(df1, 1, function(x) any(apply(df2, 1, function(y) all(y==x))))
}

An example I wrote this function to apply to is: I want to check whether each row in df1 is contained as a row in df2:

df1=data.frame(a=c(1:3),b=c("a","b","c"))
df2=data.frame(a=c(1:6),b=rep(c("a","b","c"),2))

For each row of df1, I want to check if it is contained as a row in df2. I want to return of the function to be a logical vector of length nrow(df1).

Thank you for your help.

581

asked Mar 26 '14 21:03

Bruce Chen

2 Answers

One way is to paste the rows together, and compare them with %in%. The result is a logical vector the length of nrow(df1), as requested.

do.call(paste0, df1) %in% do.call(paste0, df2)
# [1] TRUE TRUE TRUE

143

answered Oct 13 '22 04:10

Rich Scriven

Try:

Filter(function(x) x > 0, which(duplicated(rbind(df2, df1))) - nrow(df2))

It will tell you which row numbers in df1 occur in df2. If you want an atomic vector of logicals like in Richard Scriven's answer, try

duplicated(rbind(df2, df1))[-seq_len(nrow(df2))]

It is also faster since it uses an internal C function duplicated (mine is rowcheck2)

> microbenchmark(rowcheck(df1, df2), rowcheck2(df1, df2))
 Unit: milliseconds
                expr      min       lq   median       uq       max neval
  rowcheck(df1, df2) 2.045210 2.169182 2.328296 3.539328 13.971517   100
  rowcheck2(df1, df2) 1.046207 1.112395 1.243390 1.727921  7.442499   100

answered Oct 13 '22 04:10

Robert Krzyzanowski

Related questions
                            
                                How to set the number of decimals in report produced with knitr/pander?
                            
                                Adjusting geom_bar (position="dodge") in ggplot
                            
                                Read local HTML file into R
                            
                                dropdown boxes in RSelenium
                            
                                R - Conditional row highlighting in HTML table created using xtable or kable
                            
                                Adding a new column to matrix error
                            
                                Get windows system folders (user home directory, "My documents", etc) path in R
                            
                                ggsave losing unicode characters from ggplot+gridExtra
                            
                                How to combine scales for colour and size into one legend?
                            
                                How to suppress automatic table name and number in an .Rmd file using xtable or knitr::kable?
                            
                                Cannot install ggplot2: "Error in library.dynam(lib, package, package.lib) : shared object ‘stringi.so’ not found"
                            
                                Error: x must be atomic for 'sort.list'
                            
                                Use by = each row for data table
                            
                                Making a stacked area plot using ggplot2
                            
                                Display Correlation Tables as Descending List
                            
                                Importing one long line of data into R
                            
                                List xlsx sheetnames with R
                            
                                How to handle list in R to Rcpp
                            
                                What is the difference between matrix() and as.matrix() in r?
                            
                                How do I get a list, sorted by frequency, in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With