extract maximal set of independent columns from a matrix [closed]

Tags:

matrix

I have a matrix that looks like this:

 1 1 1 1 1 1 1 1 1  1  1  1
 1 1 1 1 1 1 0 0 0  0  0  0
 0 0 1 1 0 0 0 0 1  1  0  0
 1 1 0 0 0 0 1 1 0  0  0  0
 0 0 1 1 0 0 0 0 0  0  0  0
 1 1 0 0 0 0 0 0 0  0  0  0

You can see every two columns are identical, indicating the "group membership" of the design matrix. Now my question is, how can I convert this rank-deficient matrix (rank = 6) into a full-rank matrix automatically in R? This case may be a little bit special, i.e. I can delete duplicate columns manually. I am just curious if there is an approach that solve the problem "more generally". Thanks!

897

asked Sep 30 '13 17:09

alittleboy

2 Answers

I think the way R does QR decomposition this works (and by works I mean leaves a set of independent columns):

m[, qr(m)$pivot[seq_len(qr(m)$rank)]]

On the example from OP:

m = structure(c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 
1L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L, 
0L, 1L, 1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 
1L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L), .Dim = c(6L, 12L
))

m[, qr(m)$pivot[seq_len(qr(m)$rank)]]
#     [,1] [,2] [,3] [,4] [,5] [,6]
#[1,]    1    1    1    1    1    1
#[2,]    1    1    1    0    0    0
#[3,]    0    1    0    0    1    0
#[4,]    1    0    0    1    0    0
#[5,]    0    1    0    0    0    0
#[6,]    1    0    0    0    0    0

answered Oct 04 '22 16:10

eddi

Try:

X[,duplicated(cor(X))]

cor(x) computes the correlation matrix of x. If two columns are linearly dependent to each other they'll have the same column in the correlation matrix

This will get rid of the columns that are a linear transformations of a single other column.

If you're looking for row reduced echelon form instead, which will show you if a column is a linear combination of multiple other columns, check out this answer:

Reduced row echelon form

answered Oct 04 '22 16:10

kith

Related questions
                            
                                How to use the pairs function combined with layout in R?
                            
                                Subsetting based on values of a different data frame in R
                            
                                Sort year-month column by year AND month
                            
                                Function write.csv returns an error
                            
                                Finding strings not %in% other vector of strings [duplicate]
                            
                                How to convert continuous variable to discrete in R?
                            
                                Union of dataframes in R by rownames
                            
                                Error in heatmap.2 (gplots)
                            
                                which is the best criteria for choosing between ets() and auto.arima() functions in R?
                            
                                Does random forest in R have a limitation of size of training data?
                            
                                R: how to compute differences based on a factor's levels?
                            
                                Create a block circulant matrix in R
                            
                                Compute percentile for a given value
                            
                                R: overlay plot on levelplot
                            
                                How to do & plot simple and rolling linear regression on financial data xts object in R?
                            
                                Replace NAs in one variable with values from another variable
                            
                                How to automate variable selection in glmnet and cross validation
                            
                                generate sequence within group in R [duplicate]
                            
                                Change outline and fill colors of histogram with qplot
                            
                                R programming, naming output file using variable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With