Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Max nchar from two strings in matrix



I'd like to find out better approach for finding greater nchar of two strings which I'm comparing with each other.

Let say I have strings in sentenceMatch data.frame and I need to create a matrix of max(nchar(string1), nchar(string2)) but without for loop which is very slow approach.

sentenceMatch <- data.frame(Sentence=c("hello how are you",
                                   "hello how are you friend",
                                   "im fine and how about you",
                                   "good thanks",
                                   "great to hear that"))

sentenceMatch$Sentence <- as.character(sentenceMatch$Sentence)

overallMatrix_nchar <- matrix(, nrow = dim(sentenceMatch)[1], ncol = dim(sentenceMatch)[1])

for (k in 1:dim(sentenceMatch)[1]) {
  for (l in 1:dim(sentenceMatch)[1]) {
    overallMatrix_nchar[k, l] <- max(nchar(sentenceMatch[k, ]), nchar(sentenceMatch[l, ]))

Is there any better solution how can I speed up this computation? Thanks a lot for any of your help in forward.

like image 386
SmithiM Avatar asked Dec 10 '22 18:12


2 Answers

Use outer:

nc <- nchar(sentenceMatch[[1]])
outer(nc, nc, pmax)


     [,1] [,2] [,3] [,4] [,5]
[1,]   17   24   25   17   18
[2,]   24   24   25   24   24
[3,]   25   25   25   25   25
[4,]   17   24   25   11   18
[5,]   18   24   25   18   18
like image 86
G. Grothendieck Avatar answered Jan 04 '23 01:01

G. Grothendieck

sentences <- c("hello how are you",
               "hello how are you friend",
               "im fine and how about you",
               "good thanks",
               "great to hear that")
sn <- nchar(sentences)
n <- length(sn)
M1 <- matrix(sn, n, n)
M2 <- t(M1)
(M1 + M2 + abs(M1 - M2)) / 2
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   17   24   25   17   18
# [2,]   24   24   25   24   24
# [3,]   25   25   25   25   25
# [4,]   17   24   25   11   18
# [5,]   18   24   25   18   18

where I use the fact that max(x, y) = (x + y + abs(x - y)) / 2. Very similar performance:

sentences <- replicate(paste0(rep("a", rpois(1, 3000)), collapse = ""), n = 1000)

f1 <- function(sentences) {
  sn <- nchar(sentences)
  n <- length(sn)
  M1 <- matrix(sn, n, n)
  M2 <- t(M1)
  (M1 + M2 + abs(M1 - M2)) / 2

f2 <- function(sentences) {
  nc <- nchar(sentences)
  outer(nc, nc, pmax)

microbenchmark(f1(sentences), f2(sentences))
# Unit: milliseconds
#           expr      min       lq    mean   median       uq      max neval cld
#  f1(sentences) 33.39924 37.66673 57.9912 42.45684 82.01905 122.5075   100   b
#  f2(sentences) 31.59887 34.97866 50.5065 37.82217 77.82042 103.6342   100  a 
like image 20
Julius Vainora Avatar answered Jan 04 '23 03:01

Julius Vainora