I'd like to find out better approach for finding greater nchar of two strings which I'm comparing with each other.
Let say I have strings in sentenceMatch data.frame and I need to create a matrix of max(nchar(string1), nchar(string2)) but without for loop which is very slow approach.
sentenceMatch <- data.frame(Sentence=c("hello how are you",
"hello how are you friend",
"im fine and how about you",
"good thanks",
"great to hear that"))
sentenceMatch$Sentence <- as.character(sentenceMatch$Sentence)
overallMatrix_nchar <- matrix(, nrow = dim(sentenceMatch)[1], ncol = dim(sentenceMatch)[1])
for (k in 1:dim(sentenceMatch)[1]) {
for (l in 1:dim(sentenceMatch)[1]) {
overallMatrix_nchar[k, l] <- max(nchar(sentenceMatch[k, ]), nchar(sentenceMatch[l, ]))
}
}
Is there any better solution how can I speed up this computation? Thanks a lot for any of your help in forward.
Use outer
:
nc <- nchar(sentenceMatch[[1]])
outer(nc, nc, pmax)
giving:
[,1] [,2] [,3] [,4] [,5]
[1,] 17 24 25 17 18
[2,] 24 24 25 24 24
[3,] 25 25 25 25 25
[4,] 17 24 25 11 18
[5,] 18 24 25 18 18
sentences <- c("hello how are you",
"hello how are you friend",
"im fine and how about you",
"good thanks",
"great to hear that")
sn <- nchar(sentences)
n <- length(sn)
M1 <- matrix(sn, n, n)
M2 <- t(M1)
(M1 + M2 + abs(M1 - M2)) / 2
# [,1] [,2] [,3] [,4] [,5]
# [1,] 17 24 25 17 18
# [2,] 24 24 25 24 24
# [3,] 25 25 25 25 25
# [4,] 17 24 25 11 18
# [5,] 18 24 25 18 18
where I use the fact that max(x, y) = (x + y + abs(x - y)) / 2. Very similar performance:
set.seed(1)
sentences <- replicate(paste0(rep("a", rpois(1, 3000)), collapse = ""), n = 1000)
f1 <- function(sentences) {
sn <- nchar(sentences)
n <- length(sn)
M1 <- matrix(sn, n, n)
M2 <- t(M1)
(M1 + M2 + abs(M1 - M2)) / 2
}
f2 <- function(sentences) {
nc <- nchar(sentences)
outer(nc, nc, pmax)
}
library(microbenchmark)
microbenchmark(f1(sentences), f2(sentences))
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# f1(sentences) 33.39924 37.66673 57.9912 42.45684 82.01905 122.5075 100 b
# f2(sentences) 31.59887 34.97866 50.5065 37.82217 77.82042 103.6342 100 a
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With