faster way to compare rows in a data frame

Tags:

1 Answers

Here is an Rcpp solution. However, if the result matrix gets too big (i.e., there are too many hits), this will throw an error. I run the loops twice, first to get the necessary size of the result matrix and then to fill it. There is probably a better possibility. Also, obviously, this will only work with integers. If your matrix is numeric, you'll have to deal with floating point precision.

library(Rcpp)
library(inline)

#C++ code:
body <- '
const IntegerMatrix        M(as<IntegerMatrix>(MM));
const int                  m=M.ncol(), n=M.nrow();
long                        count1;
int                         count2;
count1 = 0;
for (int i=0; i<(n-1); i++)
{
   for (int j=(i+1); j<n; j++)
   {
     count2 = 0;
     for (int k=0; k<m; k++) {
        if (M(i,k)==M(j,k)) count2++;
     }
     if (count2>3) count1++;
   } 
}
IntegerMatrix              R(count1,3);
count1 = 0;
for (int i=0; i<(n-1); i++)
{
   for (int j=(i+1); j<n; j++)
   {
     count2 = 0;
     for (int k=0; k<m; k++) {
        if (M(i,k)==M(j,k)) count2++;
     }
     if (count2>3) {
        count1++;
        R(count1-1,0) = i+1;
        R(count1-1,1) = j+1;
        R(count1-1,2) = count2;
     }
   } 
}
return  wrap(R);
'

fun <- cxxfunction(signature(MM = "matrix"), 
                     body,plugin="Rcpp")

#with your data
fun(as.matrix(data))
#      [,1] [,2] [,3]
# [1,]    1    2    4
# [2,]    1    4    5
# [3,]    2    4    4

#Benchmarks
set.seed(42)
mat1 <- matrix(sample(1:10,250*26,TRUE),ncol=26)
mat2 <- matrix(sample(1:10,2500*26,TRUE),ncol=26)
mat3 <- matrix(sample(1:10,10000*26,TRUE),ncol=26)
mat4 <- matrix(sample(1:10,25000*26,TRUE),ncol=26)
library(microbenchmark)
microbenchmark(
  fun(mat1),
  fun(mat2),
  fun(mat3),
  fun(mat4),
  times=3
  )
# Unit: milliseconds
#      expr          min           lq       median           uq          max neval
# fun(mat1)     2.675568     2.689586     2.703603     2.732487     2.761371     3
# fun(mat2)   272.600480   274.680815   276.761151   276.796217   276.831282     3
# fun(mat3)  4623.875203  4643.634249  4663.393296  4708.067638  4752.741979     3
# fun(mat4) 29041.878164 29047.151348 29052.424532 29235.839275 29419.254017     3

answered Oct 25 '22 13:10

Roland

Related questions
                            
                                How run shell script from R or/and from Matlab?
                            
                                Preserving long comments in console output. Not falling victim to ".... [TRUNCATED]"
                            
                                How to set default value of a slot as NULL in R?
                            
                                Order of DESCRIPTION Imports: and NAMESPACE import() in R 2.14.0 package checking
                            
                                How to install the knitr-module in Lyx 2.0.3?
                            
                                Sort boxplot by mean (and not median) in R
                            
                                Force X axis on both graphs in a facet grid when X values are the same
                            
                                2d color gradient plot in R
                            
                                How to subset data with advance string matching
                            
                                Import weighted edgelist using igraph
                            
                                How to create a GUI via R?
                            
                                How does the 'shiny' R package deal with data frames?
                            
                                Is it possible to call external R script from R markdown (.Rmd) in RStudio?
                            
                                Can I use gsub() on each element of a data frame?
                            
                                R stacked bar graph plotting geom_text
                            
                                Create parametric R markdown documentation?
                            
                                Is there a way to 'compress' an lm() object for later prediction?
                            
                                Convert sentences to words in R
                            
                                write.table is not outputting a header for row names [duplicate]
                            
                                data.table() still converts strings to factors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

faster way to compare rows in a data frame

Tags:

r

vitor

People also ask

1 Answers

Roland

Recent Activity

Donate For Us