The problem is pretty silly, but I am wondering if I am missing something.
Let's say that there is a vector k
that contains some numbers, say
> k
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
I want to transform this to a matrix
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 0 6 7 8 9
[3,] 0 0 10 11 12
[4,] 0 0 0 13 14
[5,] 0 0 0 0 15
My first idea was to use something with upper.tri()
, for example like m[upper.tri(m, diag = TRUE)] <- k
, but that will not give the matrix above.
Is there a more intelligent solution to this? Below there's my solution but let's just say I am not too proud of it.
rows <- rep(1:5, 5:1)
cols1 <- rle(rows)$lengths
cols <- do.call(c, lapply(1:length(cols1), function(x) x:5))
for(i in 1:length(k)) {
m[rows[i], cols[i]] <- k[i]
}
Finally, there is one more way of indexing Matrices (for now), that provides only one index: If you give one index, then R will count down the first row, then the second, then the third, etc., until it reaches the index you specified. Notice how this agrees with the 5th element of the matrix V, which was used to make our matrix!
We use matrices containing numeric elements to be used in mathematical calculations. A Matrix is created using the matrix () function. The basic syntax for creating a matrix in R is − data is the input vector which becomes the data elements of the matrix. nrow is the number of rows to be created. ncol is the number of columns to be created.
It is possible to index a matrix with a single vector. While indexing in such a way, it acts like a vector formed by stacking columns of the matrix one after another. The result is returned as a vector. Two logical vectors can be used to index a matrix. In such situation, rows and columns where the value is TRUE is returned.
The process of selecting elements using their indices is called indexing, and R provides multiple ways of indexing vectors. Below we’ll cover some basic indexing and more advanced indexing for the different data structures in R.
Here's an option using lower.tri
and t
to transpose the result:
k <- 1:15
m <- matrix(0, 5,5)
m[lower.tri(m, diag = TRUE)] <- k
m <- t(m)
m
# [,1] [,2] [,3] [,4] [,5]
#[1,] 1 2 3 4 5
#[2,] 0 6 7 8 9
#[3,] 0 0 10 11 12
#[4,] 0 0 0 13 14
#[5,] 0 0 0 0 15
Microbenchmark
Since there was some confusion with Joseph's benchmark, here's another one. I tested the three solutions for matrices of size 10*10; 100*100; 1000*1000; 10000*10000.
Results:
Apparently, the performance depends heavily on the size of the matrix. For large matrices, Joseph's answer performs fastest, while for smaller matrices, mine was the fastest approach. Note that this doesn't take memory efficiency into account.
Reproducible benchmark:
Joseph <- function(k, n) {
y <- 1L
t <- rep(0L,n)
j <- c(y, sapply(1:(n-1L), function(x) y <<- y+(n+1L)-x))
t(vapply(1:n, function(x) c(rep(0L,x-1L),k[j[x]:(j[x]+n-x)]), t, USE.NAMES = FALSE))
}
Frank <- function(k, n) {
m = matrix(0L, n, n)
m[ which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1] ] = k
m
}
docendo <- function(k,n) {
m <- matrix(0L, n, n)
m[lower.tri(m, diag = TRUE)] <- k
t(m)
}
library(microbenchmark)
library(data.table)
library(ggplot2)
n <- c(10L, 100L, 1000L, 10000L)
k <- lapply(n, function(x) seq.int((x^2 + x)/2))
b <- lapply(seq_along(n), function(i) {
bm <- microbenchmark(Joseph(k[[i]], n[i]), Frank(k[[i]], n[i]), docendo(k[[i]], n[i]), times = 10L)
bm$n <- n[i]
bm
})
b1 <- rbindlist(b)
ggplot(b1, aes(expr, time)) +
geom_violin() +
facet_wrap(~ n, scales = "free_y") +
ggtitle("Benchmark for n = c(10L, 100L, 1000L, 10000L)")
Check equality of results:
all.equal(Joseph(k[[1]], n[1]), Frank(k[[1]], n[1]))
#[1] TRUE
all.equal(Joseph(k[[1]], n[1]), docendo(k[[1]], n[1]))
#[1] TRUE
Note: I didn't include George's approach in the comparison since, judging by Joseph's results, it seems to be a lot slower. So all approaches compared in my benchmark are written only in base R.
A variation on @docendodiscimus' answer: Instead of transposing you can change row and col indices, which you get by wrapping lower.tri
in which
:
n = 5
m = matrix(0, n, n)
m[ which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1] ] = seq(sum(seq(n)))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 0 6 7 8 9
[3,] 0 0 10 11 12
[4,] 0 0 0 13 14
[5,] 0 0 0 0 15
To understand how it works, look at the left-hand side in steps:
lower.tri(m, diag=TRUE)
which(lower.tri(m, diag=TRUE), arr.ind=TRUE)
which(lower.tri(m, diag=TRUE), arr.ind=TRUE)[, 2:1]
I guess transposing might be costly if the matrix is large, which is why I'd consider this option. Note: Joseph Wood's answer suggests that I am wrong, since the transposing way is faster in his benchmark.
(Thanks to @JosephWood:) Instead of enumerating and summing with sum(seq(n))
, you can use (n^2 - n)/2 + n
.
library(miscTools)
k <- 1:15
triang(k, 5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With