I have a matrix that looks like this:
1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0 1 1 0 0
1 1 0 0 0 0 1 1 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0
You can see every two columns are identical, indicating the "group membership" of the design matrix. Now my question is, how can I convert this rank-deficient matrix (rank = 6) into a full-rank matrix automatically in R? This case may be a little bit special, i.e. I can delete duplicate columns manually. I am just curious if there is an approach that solve the problem "more generally". Thanks!
The maximum number of linearly independent vectors in a matrix is equal to the number of non-zero rows in its row echelon matrix.
The columns of matrix A are linearly independent if and only if the equation Ax = 0 has only the trivial solution.
Given a set of vectors, you can determine if they are linearly independent by writing the vectors as the columns of the matrix A, and solving Ax = 0. If there are any non-zero solutions, then the vectors are linearly dependent. If the only solution is x = 0, then they are linearly independent.
The columns (or rows) of a matrix are linearly dependent when the number of columns (or rows) is greater than the rank, and are linearly independent when the number of columns (or rows) is equal to the rank. The maximum number of linearly independent rows equals the maximum number of linearly independent columns.
I think the way R does QR decomposition this works (and by works I mean leaves a set of independent columns):
m[, qr(m)$pivot[seq_len(qr(m)$rank)]]
On the example from OP:
m = structure(c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L,
1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L,
1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), .Dim = c(6L, 12L
))
m[, qr(m)$pivot[seq_len(qr(m)$rank)]]
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 1 1 1 1 1 1
#[2,] 1 1 1 0 0 0
#[3,] 0 1 0 0 1 0
#[4,] 1 0 0 1 0 0
#[5,] 0 1 0 0 0 0
#[6,] 1 0 0 0 0 0
Try:
X[,duplicated(cor(X))]
cor(x) computes the correlation matrix of x. If two columns are linearly dependent to each other they'll have the same column in the correlation matrix
This will get rid of the columns that are a linear transformations of a single other column.
If you're looking for row reduced echelon form instead, which will show you if a column is a linear combination of multiple other columns, check out this answer:
Reduced row echelon form
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With