I have this matrix:
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
[5,] 1 1 0 0
[6,] 0 0 1 1
[7,] 1 0 1 0
[8,] 0 1 0 1
[9,] 1 1 1 1
So, there are some rows that are complementary. In this matrix these are:
[5,] 1 1 0 0
[6,] 0 0 1 1
and
[7,] 1 0 1 0
[8,] 0 1 0 1
What I want to do is to find these complementary rows and keep just the first one of them. The expected output should be this:
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
[5,] 1 1 0 0
[6,] 1 0 1 0
[7,] 1 1 1 1
Is there a way to do this in R?
Matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know rows are the ones that run horizontally and columns are the ones that run vertically. In R programming, matrices are two-dimensional, homogeneous data structures. These are some examples of matrices:
As you can see based on the previously shown RStudio console output, our example matrix has three rows and five columns. The rows of our matrix are named Row1 – Row3 and the variables are named Col1 – Col5. Let’s extract some values of our matrix!
So to match rows of one matrix to another, you could do: t transposes the rows into columns, then data.frame creates a list of the columns in the transposed matrix. The function row.match in the package prodlim allows you to identify the rows in one matrix that are also found (identical) in another matrix. Very convenient and easy to use.
In R, match function for rows or columns of matrix. The value matching function in R is very useful. But from my understanding, it does not sufficiently support two or high dimensional inputs. For example, assume x and y are matrices of same number of columns, and I would like to match rows of x to rows of y.
If your matrix is called m
:
# find duplicate rows
dists <- as.matrix(dist(m, method = "manhattan"))
equals <- which(dists == ncol(m), arr.ind = TRUE, useNames = FALSE)
# remove symmetry (5,6 == 6,5)
equals <- equals[equals[,1] < equals[,2],]
to_drop <- equals[,2]
m <- m[-to_drop,]
This uses the Manhattan distance to find rows for which the sum of the differences equals the number of columns, hence all elements are different.
In base-R is all that is needed to run this code.
Sample data:
mydata<- matrix(c(1,0,0,0,1,0,1,0,1,0,1,0,0,1,0,0,1,1,0,0,1,0,0,1,1,0,1,0,0,0,1,0,1,0,1,1),ncol=4)
Code
i=1
while(i <= nrow(mydata)){
test <- matrix(rep(mydata[i,],nrow(mydata)),nrow=nrow(mydata),byrow=T)+mydata
RowsToRemove <- grep(1,sapply(1:nrow(mydata),function(x) prod(test[x,]==1)))
if(length(RowsToRemove)!=0){
mydata <- mydata[-RowsToRemove,]
}
i=i+1
}
Output
> mydata
[,1] [,2] [,3] [,4]
[1,] 1 0 0 0
[2,] 0 1 0 0
[3,] 0 0 1 0
[4,] 0 0 0 1
[5,] 1 1 0 0
[6,] 1 0 1 0
[7,] 1 1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With