I have this matrix: <pre class="prettyprint"><code> [,1] [,2] [,3] [,4] [1,] 1 0 0 0 [2,] 0 1 0 0 [3,] 0 0 1 0 [4,] 0 0 0 1 [5,] 1 1 0 0 [6,] 0 0 1 1 [7,] 1 0 1 0 [8,] 0 1 0 1 [9,] 1 1 1 1 </code></pre> So, there are some rows that are complementary. In this matrix these are: <pre class="prettyprint"><code>[5,] 1 1 0 0 [6,] 0 0 1 1 </code></pre> and <pre class="prettyprint"><code>[7,] 1 0 1 0 [8,] 0 1 0 1 </code></pre> What I want to do is to find these complementary rows and keep just the first one of them. The expected output should be this: <pre class="prettyprint"><code> [,1] [,2] [,3] [,4] [1,] 1 0 0 0 [2,] 0 1 0 0 [3,] 0 0 1 0 [4,] 0 0 0 1 [5,] 1 1 0 0 [6,] 1 0 1 0 [7,] 1 1 1 1 </code></pre> Is there a way to do this in R?

If your matrix is called <code>m</code>: <pre class="prettyprint"><code># find duplicate rows dists <- as.matrix(dist(m, method = "manhattan")) equals <- which(dists == ncol(m), arr.ind = TRUE, useNames = FALSE) # remove symmetry (5,6 == 6,5) equals <- equals[equals[,1] < equals[,2],] to_drop <- equals[,2] m <- m[-to_drop,] </code></pre> This uses the Manhattan distance to find rows for which the sum of the differences equals the number of columns, hence all elements are different.

How to find complementary rows in matrix in R

Tags:

r

matrix

I have this matrix:

      [,1] [,2] [,3] [,4]
 [1,]    1    0    0    0
 [2,]    0    1    0    0
 [3,]    0    0    1    0
 [4,]    0    0    0    1
 [5,]    1    1    0    0
 [6,]    0    0    1    1
 [7,]    1    0    1    0
 [8,]    0    1    0    1
 [9,]    1    1    1    1

So, there are some rows that are complementary. In this matrix these are:

[5,]    1    1    0    0
[6,]    0    0    1    1

and

[7,]    1    0    1    0
[8,]    0    1    0    1

What I want to do is to find these complementary rows and keep just the first one of them. The expected output should be this:

      [,1] [,2] [,3] [,4]
 [1,]    1    0    0    0
 [2,]    0    1    0    0
 [3,]    0    0    1    0
 [4,]    0    0    0    1
 [5,]    1    1    0    0
 [6,]    1    0    1    0
 [7,]    1    1    1    1

Is there a way to do this in R?

928

asked Apr 17 '20 15:04

Alex

2 Answers

If your matrix is called m:

# find duplicate rows
dists <- as.matrix(dist(m, method = "manhattan"))
equals <- which(dists == ncol(m), arr.ind = TRUE, useNames = FALSE)

# remove symmetry (5,6 == 6,5)
equals <- equals[equals[,1] < equals[,2],]
to_drop <- equals[,2]

m <- m[-to_drop,]

This uses the Manhattan distance to find rows for which the sum of the differences equals the number of columns, hence all elements are different.

136

answered Oct 31 '22 04:10

Bas

In base-R is all that is needed to run this code.

Sample data:

mydata<- matrix(c(1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,0,1,1,0,0,1,0,0,1,1,0,1,0,0,0,1,0,1,0,1,1),ncol=4)

Code

i=1
while(i <= nrow(mydata)){
  test <- matrix(rep(mydata[i,],nrow(mydata)),nrow=nrow(mydata),byrow=T)+mydata
  RowsToRemove <- grep(1,sapply(1:nrow(mydata),function(x) prod(test[x,]==1)))
  if(length(RowsToRemove)!=0){
    mydata <- mydata[-RowsToRemove,]
  }
  i=i+1
}

Output

> mydata
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    0    0    0    1
[5,]    1    1    0    0
[6,]    1    0    1    0
[7,]    1    1    1    1

answered Oct 31 '22 06:10

Daniel O

Related questions
                            
                                R: Create empty tibble/data frame with column names coming from a vector
                            
                                Customizing how DataTables displays missing values in Shiny [duplicate]
                            
                                Finding second space after each comma
                            
                                How to pass aes parameters of ggplot to function
                            
                                Connect bars with lines in R plotly
                            
                                Grouping sequential values with "gap tolerance"
                            
                                Extend geom_vline outside of plot
                            
                                R: Unusual error plotting multipolygons with ggplot, geom_sf, and openstreetmap
                            
                                Using match and apply in R
                            
                                How to install newer version of R on Amazon Linux 2
                            
                                Find best match for multiple substrings across multiple candidates
                            
                                How to use R data.table column names with cube(..., j = ,...) within a function?
                            
                                ggplot2 error message: Error in seq.default(range[1], range[2], length.out = nframes) : 'from' must be a finite number
                            
                                R-devel with sanitizer from rocker
                            
                                NA replacing with blanks
                            
                                How to change the Hover background color in ggplotly for bar chart
                            
                                How to query number of Ids in batches in R
                            
                                Extracting underlying data via RSelenium with embedded leaflet svg, and more
                            
                                How to display a warning only once per session?
                            
                                Embedding a script within a for-loop in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With