Few requirements.
Before posting your answer please!!
1) Make sure that your function does not give errors with other data, simulate several similar matrices. (turn off the seed)
2) Make sure your function is faster than mine
3) Make sure that your function works exactly the same as mine, simulate it on different matrices (turn off the seed)
for example
for(i in 1:500){
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
res <- c(my_fun(m),your_function(m))
print(res)
if(sum(res)==1) break
}
m
4) the function should work with a matrix with any number of rows and columns
==========================================================
The function looks for a true in the first column of the logical matrix, if a true is found, go to column 2 and a new row, and so on..
If the sequence is found return true if not false
set.seed(15)
m <- matrix(sample(c(F,T),30,T),ncol = 3) ; colnames(m) <- paste0("x",1:ncol(m))
m
x1 x2 x3
[1,] FALSE TRUE TRUE
[2,] FALSE FALSE FALSE
[3,] TRUE TRUE TRUE
[4,] TRUE TRUE TRUE
[5,] FALSE FALSE FALSE
[6,] TRUE TRUE FALSE
[7,] FALSE TRUE FALSE
[8,] FALSE FALSE FALSE
[9,] FALSE FALSE TRUE
[10,] FALSE FALSE TRUE
my slow example function
find_seq <- function(m){
colum <- 1
res <- rep(FALSE,ncol(m))
for(i in 1:nrow(m)){
if(m[i,colum]==TRUE){
res[colum] <- TRUE
print(c(row=i,col=colum))
colum <- colum+1}
if(colum>ncol(m)) break
}
all(res)
}

find_seq(m)
row col
3 1
row col
4 2
row col
9 3
[1] TRUE
how to make it as fast as possible?
UPD=========================
microbenchmark::microbenchmark(Jean_Claude_Arbaut_fun(m),
+ ThomasIsCoding_fun(m),
+ my_fun(m))
Unit: microseconds
expr min lq mean median uq max neval cld
Jean_Claude_Arbaut_fun(m) 2.850 3.421 4.36179 3.9915 4.5615 27.938 100 a
ThomasIsCoding_fun(m) 14.824 15.965 17.92030 16.5350 17.1050 101.489 100 b
my_fun(m) 23.946 24.517 25.59461 25.0880 25.6580 42.192 100 c
If you are pursuing the speed, you can try the following base R solution
TIC_fun <- function(m) {
p <- k <- 1
nr <- nrow(m)
nc <- ncol(m)
repeat {
if (p > nr) {
return(FALSE)
}
found <- FALSE
for (i in p:nr) {
if (m[i, k]) {
# print(c(row = i, col = k))
p <- i + 1
k <- k + 1
found <- TRUE
break
}
}
if (!found) {
return(FALSE)
}
if (k > nc) {
return(TRUE)
}
}
}
and you will see
Unit: microseconds
expr min lq mean median uq max neval
my_fun(m) 18.600 26.3010 41.46795 41.5510 44.3010 121.302 100
TIC_fun(m) 10.201 14.1515 409.89394 22.6505 24.4005 38906.601 100
You can try the code below
lst <- with(as.data.frame(which(m, arr.ind = TRUE)), split(row, col))
# lst <- apply(m, 2, which)
setNames(
stack(
setNames(
Reduce(function(x, y) y[y > x][1],
lst,
init = -Inf,
accumulate = TRUE
)[-1],
names(lst)
)
),
c("row", "col")
)
which gives
row col
1 3 1
2 4 2
3 9 3
A more interesting implementation might be using the recursions (just for fun, but not recommanded due to the inefficiency)
f <- function(k) {
if (k == 1) {
return(data.frame(row = which(m[, k])[1], col = k))
}
s <- f(k - 1)
for (i in (tail(s, 1)$row + 1):nrow(m)) {
if (m[i, k]) {
return(rbind(s, data.frame(row = i, col = k)))
}
}
}
and which gives
> f(ncol(m))
row col
1 3 1
2 4 2
3 9 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With