I am trying to find if there is a quick way to search for specific strings in arrays in R, kind of like the game Boggle, except you know the word upfront. You are allowed to move in the following directions for the next letter of the string: up, down, right or left Say for a simple example you have an array of the form: <pre class="prettyprint"><code>> G A, Q, A, Q, Q, A, Q, P, Q, Q, Q, Q, P, L, Q, Q, Q, Q, E, Q </code></pre> And you want to apply a function to G with the string <code>APPLE</code>, for the function to return <code>TRUE</code>, <code>APPLE</code> exists in this array, and <code>FALSE</code> if it doesn't. Does there exist a pre-made function or package that can do this, or alternatively is there a smart way to do it, I'm relatively new to dealing with strings in R and I'm struggling to see a way. Any help much appreciated. Thanks.

this will first check if there are any characters in your word that do not exist within the array and then will check if the number of characters in the array are sufficient to meet repeat letters in your word <pre class="prettyprint"><code>word <- strsplit("APPLE", "") pool <- c("A", "Q", "A", "Q", "Q", "A", "Q", "P", "Q", "Q", "Q", "Q", "P", "L", "Q", "Q", "Q", "Q", "E", "Q") t.word <- table(word) t.pool <- table(pool) length(setdiff(names(t.word), names(t.pool))) == 0 min(t.pool[names(t.word)] - t.word) >= 0 </code></pre> the last two functions will both output <code>TRUE</code> to show that all the letters from <code>word</code> exist in <code>pool</code> and that the count of a single letter in <code>word</code> is not greater than that of <code>pool</code> in function form that will output <code>TRUE</code> if found, otherwise <code>FALSE</code> <pre class="prettyprint"><code>word.find <- function(word, pool) { t.word <- table(strsplit(word, "")) t.pool <- table(pool) length(setdiff(names(t.word), names(t.pool))) == 0 & min(t.pool[names(t.word)] - t.word) >= 0 } word.find("APPLE", pool) [1] TRUE word.find("APPLES", pool) [1] FALSE word.find("APPLEE", pool) [1] FALSE </code></pre>

Adding another approach, having: <pre class="prettyprint"><code>board = structure(c("A", "A", "Q", "Q", "Q", "Q", "Q", "Q", "A", "P", "P", "Q", "Q", "Q", "L", "E", "Q", "Q", "Q", "Q"), .Dim = 4:5, .Dimnames = list( NULL, NULL)) word = "APPLE" </code></pre> we start with: <pre class="prettyprint"><code>matches = lapply(strsplit(word, NULL)[[1]], function(x) which(x == board, arr.ind = TRUE)) </code></pre> which is a simple -probably unavoidable- search of indices of "board" that match each letter of the word. It's a "list" containing the row/col indices like: <pre class="prettyprint"><code>#[[1]] # row col #[1,] 1 1 #[2,] 2 1 #[3,] 1 3 # #[[2]] # row col #[1,] 2 3 #[2,] 3 3 # ##..... </code></pre> Having that, we need to find out, progressively, whether an index in each element has a neighbour (i.e. the right/left/up/down cell) in the next element. E.g. we need something like: <pre class="prettyprint"><code>as.matrix(find_neighbours(matches[[1]], matches[[2]], dim(board))) # [,1] [,2] #[1,] FALSE FALSE #[2,] FALSE FALSE #[3,] TRUE FALSE </code></pre> which informs us, that the row 3 of <code>matches[[1]]</code> is a neighbour of row 1 of <code>matches[[2]]</code>, i.e. <code>[1, 3]</code> and <code>[2, 3]</code> are, indeed, neighbouring cells. We need this for each successive element in "matches": <pre class="prettyprint"><code>are_neighs = Map(function(x, y) which(find_neighbours(x, y, dim(board)), TRUE), matches[-length(matches)], matches[-1]) are_neighs #[[1]] # [,1] [,2] #[1,] 3 1 # #[[2]] # [,1] [,2] #[1,] 2 1 #[2,] 1 2 # #[[3]] # [,1] [,2] #[1,] 2 1 # #[[4]] # [,1] [,2] #[1,] 1 1 </code></pre> Now that we have the pairwise ("i" with "i + 1") neighbour matches we need to complete the chain. For this example we'd like to have a vector like <code>c(1, 2, 1, 1)</code> which contains the info that the row 1 of <code>are_neighs[[1]]</code> is chained with the row 2 of <code>are_neighs[[2]]</code> which is chained with row 1 of <code>are_neighs[[3]]</code> which is chained with row 1 of <code>are_neighs[[4]]</code>. This smells like an "igraph" problem, but I'm not so familiar with it (hopefully someone has a better idea), so here's a naive approach to get that chaining: <pre class="prettyprint"><code>row_connections = matrix(NA_integer_, nrow(are_neighs[[1]]), length(are_neighs)) row_connections[, 1] = 1:nrow(are_neighs[[1]]) cur = are_neighs[[1]][, 2] for(i in 1:(length(are_neighs) - 1)) { im = match(cur, are_neighs[[i + 1]][, 1]) cur = are_neighs[[i + 1]][, 2][im] row_connections[, i + 1] = im } row_connections = row_connections[complete.cases(row_connections), , drop = FALSE] </code></pre> Which returns: <pre class="prettyprint"><code>row_connections # [,1] [,2] [,3] [,4] #[1,] 1 2 1 1 </code></pre> Having this vector, now, we can extract the respective chain from "are_neighs": <pre class="prettyprint"><code>Map(function(x, i) x[i, ], are_neighs, row_connections[1, ]) #[[1]] #[1] 3 1 # #[[2]] #[1] 1 2 # #[[3]] #[1] 2 1 # #[[4]] #[1] 1 1 </code></pre> which can be used to extract the appropriate row/col chain of indices from "matches": <pre class="prettyprint"><code>ans = vector("list", nrow(row_connections)) for(i in 1:nrow(row_connections)) { connect = Map(function(x, i) x[i, ], are_neighs, row_connections[i, ]) ans[[i]] = do.call(rbind, Map(function(x, i) x[i, ], matches, c(connect[[1]][1], sapply(connect, "[", 2)))) } ans #[[1]] # row col #[1,] 1 3 #[2,] 2 3 #[3,] 3 3 #[4,] 3 4 #[5,] 4 4 </code></pre> Wrapping it all in a function (<code>find_neighbours</code> is defined inside): <pre class="prettyprint"><code>library(Matrix) ff = function(word, board) { matches = lapply(strsplit(word, NULL)[[1]], function(x) which(x == board, arr.ind = TRUE)) find_neighbours = function(x, y, d) { neighbours = function(i, j, d = d) { ij = rbind(cbind(i, j + c(-1L, 1L)), cbind(i + c(-1L, 1L), j)) ijr = ij[, 1]; ijc = ij[, 2] ij = ij[((ijr > 0L) & (ijr <= d[1])) & ((ijc > 0L) & (ijc <= d[2])), ] ij[, 1] + (ij[, 2] - 1L) * d[1] } x.neighs = lapply(1:nrow(x), function(i) neighbours(x[i, 1], x[i, 2], dim(board))) y = y[, 1] + (y[, 2] - 1L) * d[1] x.sparse = sparseMatrix(i = unlist(x.neighs), j = rep(seq_along(x.neighs), lengths(x.neighs)), x = 1L, dims = c(prod(d), length(x.neighs))) y.sparse = sparseMatrix(i = y, j = seq_along(y), x = 1L, dims = c(prod(d), length(y))) ans = crossprod(x.sparse, y.sparse, boolArith = TRUE) ans } are_neighs = Map(function(x, y) which(find_neighbours(x, y, dim(board)), TRUE), matches[-length(matches)], matches[-1]) row_connections = matrix(NA_integer_, nrow(are_neighs[[1]]), length(are_neighs)) row_connections[, 1] = 1:nrow(are_neighs[[1]]) cur = are_neighs[[1]][, 2] for(i in 1:(length(are_neighs) - 1)) { im = match(cur, are_neighs[[i + 1]][, 1]) cur = are_neighs[[i + 1]][, 2][im] row_connections[, i + 1] = im } row_connections = row_connections[complete.cases(row_connections), , drop = FALSE] ans = vector("list", nrow(row_connections)) for(i in 1:nrow(row_connections)) { connect = Map(function(x, i) x[i, ], are_neighs, row_connections[i, ]) ans[[i]] = do.call(rbind, Map(function(x, i) x[i, ], matches, c(connect[[1]][1], sapply(connect, "[", 2)))) } ans } </code></pre> We can try it: <pre class="prettyprint"><code>ff("APPLE", board) #[[1]] # row col #[1,] 1 3 #[2,] 2 3 #[3,] 3 3 #[4,] 3 4 #[5,] 4 4 </code></pre> And with more than one matches: <pre class="prettyprint"><code>ff("AQQP", board) #[[1]] # row col #[1,] 1 1 #[2,] 1 2 #[3,] 2 2 #[4,] 2 3 # #[[2]] # row col #[1,] 1 3 #[2,] 1 2 #[3,] 2 2 #[4,] 2 3 # #[[3]] # row col #[1,] 1 3 #[2,] 1 4 #[3,] 2 4 #[4,] 2 3 </code></pre> Although, it's flexible in returning multiple matches, it does not return all possible matches and, in a nutshell, that's because of the use of <code>match</code> when building the chain of neighbours -- a linear search could be used instead, but -at the moment- adds significant code complexity.

Finding specific strings in an array using R

Tags:

arrays

string

r

I am trying to find if there is a quick way to search for specific strings in arrays in R, kind of like the game Boggle, except you know the word upfront.

You are allowed to move in the following directions for the next letter of the string: up, down, right or left

Say for a simple example you have an array of the form:

> G    
A, Q, A, Q, Q,  
A, Q, P, Q, Q,   
Q, Q, P, L, Q,   
Q, Q, Q, E, Q

And you want to apply a function to G with the string APPLE, for the function to return TRUE, APPLE exists in this array, and FALSE if it doesn't.

Does there exist a pre-made function or package that can do this, or alternatively is there a smart way to do it, I'm relatively new to dealing with strings in R and I'm struggling to see a way.

Any help much appreciated. Thanks.

436

asked Jan 06 '17 14:01

user2915209

3 Answers

this will first check if there are any characters in your word that do not exist within the array and then will check if the number of characters in the array are sufficient to meet repeat letters in your word

word <- strsplit("APPLE", "")
pool <- c("A", "Q", "A", "Q",
          "Q", "A", "Q", "P",
          "Q", "Q", "Q", "Q",
          "P", "L", "Q", "Q",
          "Q", "Q", "E", "Q")

t.word <- table(word)
t.pool <- table(pool)

length(setdiff(names(t.word), names(t.pool))) == 0
min(t.pool[names(t.word)] - t.word) >= 0

the last two functions will both output TRUE to show that all the letters from word exist in pool and that the count of a single letter in word is not greater than that of pool

in function form that will output TRUE if found, otherwise FALSE

word.find <- function(word, pool) {
  t.word <- table(strsplit(word, ""))
  t.pool <- table(pool)
  length(setdiff(names(t.word), names(t.pool))) == 0 & min(t.pool[names(t.word)] - t.word) >= 0
}

word.find("APPLE", pool)
[1] TRUE

word.find("APPLES", pool)
[1] FALSE

word.find("APPLEE", pool)
[1] FALSE

187

answered Oct 22 '22 19:10

manotheshark

This function works using only base R

THE FUNCTION

search_string = function(matrix_array, word_to_search){

    position = data.frame(NA,NA,NA) #Create empty dataframe

    word_to_search_inv = sapply(lapply(strsplit(word_to_search, NULL), rev), paste, collapse="") #Reverse word_to_search

    for (i in 1:nrow(matrix_array)){
        str_row = paste((matrix_array[i,]),collapse = "") #Collapse entire row into a string
        if (grepl(word_to_search,str_row)) { #Check if the word_to_search is in the string towards right
            position = rbind(position,c(i,paste(gregexpr(word_to_search, str_row)[[1]], collapse = ', '),"RIGHT")) #Get position and add it to the dataframe      
        }
        if (grepl(word_to_search_inv,str_row)) {#Check if the word_to_search is in the string towards left (by checking for reverse of word_to_search)
            position = rbind(position,c(i,paste(gregexpr(word_to_search_inv, str_row)[[1]], collapse = ', '),"LEFT"))       
        }
    }

    for (j in 1:ncol(matrix_array)){        
        str_column = paste((matrix_array[,j]),collapse = "")
        if (grepl(word_to_search, str_column)) { #Check if the word_to_search is in the string towards down
            position = rbind(position, c(paste(gregexpr(word_to_search, str_column)[[1]], collapse = ', '),j,"DOWN"))
        }
        if (grepl(word_to_search_inv, str_column)) { #Check if the word_to_search is in the string towards up
            position = rbind(position, c(paste(gregexpr(word_to_search_inv, str_column)[[1]], collapse = ', '),j,"UP"))
        }
    }

    colnames(position) = c("ROW","COLUMN","DIRECTION")
    position = position[c(2:nrow(position)),]
    rownames(position) = NULL
    return(position) #Return the datafram containing row, columnm, and direction where word_to_match is found
}

USAGE

#Data
mydata = structure(c("A", "A", "Q", "Q", "D", "Q", "Q", "Q", "Q", "B", 
                     "A", "P", "P", "L", "E", "Q", "Q", "L", "E", "S", "Q", "Q", "Q", 
                     "Q", "T", "A", "P", "P", "L", "E"), .Dim = c(5L, 6L), .Dimnames = list(NULL, c("V1", "V2", 
                                                                           "V3", "V4", "V5", "V6")))

key = "APPLE"

#Run the function
pos = search_string(mydata,key)

answered Oct 22 '22 18:10

d.b

Adding another approach, having:

board = structure(c("A", "A", "Q", "Q", "Q", "Q", "Q", "Q", "A", "P", 
"P", "Q", "Q", "Q", "L", "E", "Q", "Q", "Q", "Q"), .Dim = 4:5, .Dimnames = list(
    NULL, NULL))

word = "APPLE"

we start with:

matches = lapply(strsplit(word, NULL)[[1]], function(x) which(x == board, arr.ind = TRUE))

which is a simple -probably unavoidable- search of indices of "board" that match each letter of the word. It's a "list" containing the row/col indices like:

#[[1]]
#     row col
#[1,]   1   1
#[2,]   2   1
#[3,]   1   3
#
#[[2]]
#     row col
#[1,]   2   3
#[2,]   3   3
#
##.....

Having that, we need to find out, progressively, whether an index in each element has a neighbour (i.e. the right/left/up/down cell) in the next element. E.g. we need something like:

as.matrix(find_neighbours(matches[[1]], matches[[2]], dim(board)))
#      [,1]  [,2]
#[1,] FALSE FALSE
#[2,] FALSE FALSE
#[3,]  TRUE FALSE

which informs us, that the row 3 of matches[[1]] is a neighbour of row 1 of matches[[2]], i.e. [1, 3] and [2, 3] are, indeed, neighbouring cells. We need this for each successive element in "matches":

are_neighs = Map(function(x, y) which(find_neighbours(x, y, dim(board)), TRUE), 
                 matches[-length(matches)], matches[-1])
are_neighs
#[[1]]
#     [,1] [,2]
#[1,]    3    1
#
#[[2]]
#     [,1] [,2]
#[1,]    2    1
#[2,]    1    2
#
#[[3]]
#     [,1] [,2]
#[1,]    2    1
#
#[[4]]
#     [,1] [,2]
#[1,]    1    1

Now that we have the pairwise ("i" with "i + 1") neighbour matches we need to complete the chain. For this example we'd like to have a vector like c(1, 2, 1, 1) which contains the info that the row 1 of are_neighs[[1]] is chained with the row 2 of are_neighs[[2]] which is chained with row 1 of are_neighs[[3]] which is chained with row 1 of are_neighs[[4]]. This smells like an "igraph" problem, but I'm not so familiar with it (hopefully someone has a better idea), so here's a naive approach to get that chaining:

row_connections = matrix(NA_integer_, nrow(are_neighs[[1]]), length(are_neighs))
row_connections[, 1] = 1:nrow(are_neighs[[1]])
cur = are_neighs[[1]][, 2]
for(i in 1:(length(are_neighs) - 1)) {
    im = match(cur, are_neighs[[i + 1]][, 1]) 
cur = are_neighs[[i + 1]][, 2][im]
row_connections[, i + 1] = im
}
row_connections = row_connections[complete.cases(row_connections), , drop = FALSE]

Which returns:

row_connections
#     [,1] [,2] [,3] [,4]
#[1,]    1    2    1    1

Having this vector, now, we can extract the respective chain from "are_neighs":

Map(function(x, i) x[i, ], are_neighs, row_connections[1, ])
#[[1]]
#[1] 3 1
#
#[[2]]
#[1] 1 2
#
#[[3]]
#[1] 2 1
#
#[[4]]
#[1] 1 1

which can be used to extract the appropriate row/col chain of indices from "matches":

ans = vector("list", nrow(row_connections))
for(i in 1:nrow(row_connections)) {
     connect = Map(function(x, i) x[i, ], are_neighs, row_connections[i, ])
     ans[[i]] = do.call(rbind, Map(function(x, i) x[i, ], matches, c(connect[[1]][1], sapply(connect, "[", 2))))
}
ans
#[[1]]
#     row col
#[1,]   1   3
#[2,]   2   3
#[3,]   3   3
#[4,]   3   4
#[5,]   4   4

Wrapping it all in a function (find_neighbours is defined inside):

library(Matrix)
ff = function(word, board)
{
    matches = lapply(strsplit(word, NULL)[[1]], function(x) which(x == board, arr.ind = TRUE))

    find_neighbours = function(x, y, d)
    {
        neighbours = function(i, j, d = d) 
        {
            ij = rbind(cbind(i, j + c(-1L, 1L)), cbind(i + c(-1L, 1L), j))
            ijr = ij[, 1]; ijc = ij[, 2]
            ij = ij[((ijr > 0L) & (ijr <= d[1])) & ((ijc > 0L) & (ijc <= d[2])), ]

            ij[, 1] + (ij[, 2] - 1L) * d[1]
        }

        x.neighs = lapply(1:nrow(x), function(i) neighbours(x[i, 1], x[i, 2], dim(board)))
        y = y[, 1] + (y[, 2] - 1L) * d[1]

        x.sparse = sparseMatrix(i = unlist(x.neighs), 
                                j = rep(seq_along(x.neighs), lengths(x.neighs)), 
                                x = 1L, dims = c(prod(d), length(x.neighs)))
        y.sparse = sparseMatrix(i = y, j = seq_along(y), x = 1L, dims = c(prod(d), length(y)))                         

        ans = crossprod(x.sparse, y.sparse, boolArith = TRUE)

        ans
    }      

    are_neighs = Map(function(x, y) which(find_neighbours(x, y, dim(board)), TRUE), matches[-length(matches)], matches[-1])

    row_connections = matrix(NA_integer_, nrow(are_neighs[[1]]), length(are_neighs))
    row_connections[, 1] = 1:nrow(are_neighs[[1]])
    cur = are_neighs[[1]][, 2]
    for(i in 1:(length(are_neighs) - 1)) {
        im = match(cur, are_neighs[[i + 1]][, 1]) 
        cur = are_neighs[[i + 1]][, 2][im]
        row_connections[, i + 1] = im
    }
    row_connections = row_connections[complete.cases(row_connections), , drop = FALSE]

    ans = vector("list", nrow(row_connections))
    for(i in 1:nrow(row_connections)) {
        connect = Map(function(x, i) x[i, ], are_neighs, row_connections[i, ])
        ans[[i]] = do.call(rbind, Map(function(x, i) x[i, ], matches, c(connect[[1]][1], sapply(connect, "[", 2))))
    }
    ans
}

We can try it:

ff("APPLE", board)
#[[1]]
#     row col
#[1,]   1   3
#[2,]   2   3
#[3,]   3   3
#[4,]   3   4
#[5,]   4   4

And with more than one matches:

ff("AQQP", board)
#[[1]]
#     row col
#[1,]   1   1
#[2,]   1   2
#[3,]   2   2
#[4,]   2   3
#
#[[2]]
#     row col
#[1,]   1   3
#[2,]   1   2
#[3,]   2   2
#[4,]   2   3
#
#[[3]]
#     row col
#[1,]   1   3
#[2,]   1   4
#[3,]   2   4
#[4,]   2   3

Although, it's flexible in returning multiple matches, it does not return all possible matches and, in a nutshell, that's because of the use of match when building the chain of neighbours -- a linear search could be used instead, but -at the moment- adds significant code complexity.

answered Oct 22 '22 19:10

alexis_laz

Related questions
                            
                                Why do arrays of numbers, more data sort faster than arrays of objects, fewer data in Javascript?
                            
                                Ajax Parse Error when returning array from PHP
                            
                                Longest positive subarray
                            
                                How to JSON.stringify an array of objects
                            
                                How is the Java erasure affecting the generic arrays?
                            
                                Why are Swift iterators slower than array building?
                            
                                how to stop array.filter() function
                            
                                Convert structured array to numpy array for use with Scikit-Learn
                            
                                Filter array of objects by property in nested array of objects
                            
                                PHP's array_merge_recursive behaviour on integer keys
                            
                                Java class vs array memory size?
                            
                                Convert an array of pointers to an array of IntPtr
                            
                                Difference between const and constexpr arrays
                            
                                Why does add_index using 'gin' create a 'btree' index instead?
                            
                                Accessing a one-dimensional array as a two-dimensional array
                            
                                Array as a Class Member
                            
                                What is the reason for the overhead in memory usage for arrays in Java?
                            
                                Laravel saving dynamic input data array in to database
                            
                                Why does the efficiency of numpy not scale
                            
                                How to make a custom list deserializer in Gson?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With