The recommended way to calculate the rank of a matrix in R seems to be qr
:
X <- matrix(c(1, 2, 3, 4), ncol = 2, byrow=T)
Y <- matrix(c(1.0, 1, 1, 1), ncol = 2, byrow=T)
qr(X)$rank
[1] 2
qr(Y)$rank
[1] 1
I was able to improve efficiency by modifying this function for my specific case:
qr2 <- function (x, tol = 1e-07) {
if (!is.double(x))
storage.mode(x) <- "double"
p <- as.integer(2)
n <- as.integer(2)
res <- .Fortran("dqrdc2", qr = x, n, n, p, as.double(tol),
rank = integer(1L), qraux = double(p), pivot = as.integer(1L:p),
double(2 * p), PACKAGE = "base")[c(1, 6, 7, 8)]
class(res) <- "qr"
res}
qr2(X)$rank
[1] 2
qr2(Y)$rank
[1] 1
library(microbenchmark)
microbenchmark(qr(X)$rank,qr2(X)$rank,times=1000)
Unit: microseconds
expr min lq median uq max
1 qr(X)$rank 41.577 44.041 45.580 46.812 1302.091
2 qr2(X)$rank 19.403 21.251 23.099 24.331 80.997
Using R, is it possible to calculate the rank of a 2*2 matrix even faster?
Detailed Solution. here its Rank is 2 because the number of non-zero rows in the echelon form of the given matrix is 2. Hence, the rank is 2.
The maximum number of linearly independent vectors in a matrix is equal to the number of non-zero rows in its row echelon matrix. Therefore, to find the rank of a matrix, we simply transform the matrix to its row echelon form and count the number of non-zero rows.
This matrix has two rows and three columns. Therefore, the rank of 𝐴 must be less than or equal to the smaller of these numbers, which is two. Recall also that the rank of 𝐴 is equal to zero if and only if 𝐴 is the zero matrix.
Since the determinant of the matrix is zero, its rank cannot be equal to the number of rows/columns, 2. The only remaining possibility is that the rank of the matrix is 1, which we do not need to verify by taking any further determinants. Therefore, the rank of the matrix is 1.
We can do even better using RcppEigen.
// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using Eigen::Map;
using Eigen::MatrixXd;
using Eigen::FullPivHouseholderQR;
typedef Map<MatrixXd> MapMatd;
//calculate rank of a matrix using QR decomposition with pivoting
// [[Rcpp::export]]
int rankEigen(NumericMatrix m) {
const MapMatd X(as<MapMatd>(m));
FullPivHouseholderQR<MatrixXd> qr(X);
qr.setThreshold(1e-7);
return qr.rank();
}
Benchmarks:
microbenchmark(rankEigen(X), qr3(X),times=1000)
Unit: microseconds
expr min lq median uq max neval
rankEigen(X) 1.849 2.465 2.773 3.081 18.171 1000
qr3(X) 5.852 6.469 7.084 7.392 48.352 1000
However, the tolerance is not exactly the same as in LINPACK, because of different tolerance definitions:
test <- sapply(1:200, function(i) {
Y <- matrix(c(10^(-i), 10^(-i), 10^(-i), 10^(-i)), ncol = 2, byrow=T)
qr3(Y) == rankEigen(Y)
})
which.min(test)
#[1] 159
The threshold in FullPivHouseholderQR is defined as:
A pivot will be considered nonzero if its absolute value is strictly greater than |pivot|≤ threshold * |maxpivot| where maxpivot is the biggest pivot.
Sure, just get rid of more stuff you don't need (because you know what the values are), don't do any checks, set DUP=FALSE
, and only return what you want:
qr3 <- function (x, tol = 1e-07) {
.Fortran("dqrdc2", qr=x*1.0, 2L, 2L, 2L, tol*1.0,
rank = 0L, qraux = double(2L), pivot = c(1L,2L),
double(4L), DUP = FALSE, PACKAGE = "base")[[6L]]
}
microbenchmark(qr(X)$rank,qr2(X)$rank,qr3(X),times=1000)
# Unit: microseconds
# expr min lq median uq max
# 1 qr(X)$rank 33.303 34.2725 34.9720 35.5180 737.599
# 2 qr2(X)$rank 18.334 18.9780 19.4935 19.9240 38.063
# 3 qr3(X) 6.536 7.2100 8.3550 8.5995 657.099
I'm not an advocate of removing checks, but they do slow things down. x*1.0
and tol*1.0
ensure doubles, so that's kind-of a check and adds a little overhead. Also note that DUP=FALSE
can potentially be dangerous, since you can alter the input object(s).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With