Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding pattern in a matrix in R

Tags:

r

matrix

I have a 8 x n matrix, for instance

set.seed(12345)
m <- matrix(sample(1:50, 800, replace=T), ncol=8)
head(m)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]   37   15   30    3    4   11   35   31
[2,]   44   31   45   30   24   39    1   18
[3,]   39   49    7   36   14   43   26   24
[4,]   45   31   26   33   12   47   37   15
[5,]   23   27   34   29   30   34   17    4
[6,]    9   46   39   34    8   43   42   37

I would like to find a certain pattern in the matrix, for instance I would like to know where I can find a 37, followed in the next line by a 10 and a 29 and the line after by a 42

This happens, for instance, in lines 57:59 of the above matrix

m[57:59,]
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]  *37   35    1   30   47    9   12   39
[2,]    5   22  *10  *29   13    5   17   36
[3,]   22   43    6    2   27   35  *42   50

A (probably inefficient) solution is to get all the lines containing 37 with

sapply(1:nrow(m), function(x){37 %in% m[x,]})

And then use a few loops to test the other conditions.

How could I write an efficient function to do this, that can be generalized to any user-given pattern (not necessarily over 3 lines, with possible "holes", with variable number of values in each line etc).?

EDIT: to answer various comments

  • I need to find the EXACT pattern
  • The order in the same row does not matter (if it makes things easier values can be ordered in each row)
  • The lines have to be adjacent.
  • I want to get the (starting) position of all the pattern returned (i.e., if the pattern is present multiple times in the matrix I want multiple return values).
  • The user would enter the pattern via a GUI, I have yet to decide how. For instance, to search for the above pattern he may write something like

37;10,29;42

Where ; represents a new line and , separates values on the same line. Similarly we may look for

50,51;;75;80,81

Meaning 50 and 51 in line n, 75 in line n+2, and 80 and 81 in line n+3

like image 751
nico Avatar asked Jan 04 '13 10:01

nico


1 Answers

This reads easily and is hopefully generalizable enough for you:

has.37 <- rowSums(m == 37) > 0
has.10 <- rowSums(m == 10) > 0
has.29 <- rowSums(m == 29) > 0
has.42 <- rowSums(m == 42) > 0

lag <- function(x, lag) c(tail(x, -lag), c(rep(FALSE, lag)))

which(has.37 & lag(has.10, 1) & lag(has.29, 1) & lag(has.42, 2))
# [1] 57

Edit: here is a generalization that can use positive and negative lags:

find.combo <- function(m, pattern.df) {

   lag <- function(v, i) {
      if (i == 0) v else
      if (i > 0)  c(tail(v, -i), c(rep(FALSE, i))) else
      c(rep(FALSE, -i), head(v, i))
   }

   find.one <- function(x, i) lag(rowSums(m == x) > 0, i)
   matches  <- mapply(find.one, pattern.df$value, pattern.df$lag)
   which(rowSums(matches) == ncol(matches))

}

Tested here:

pattern.df <- data.frame(value = c(40, 37, 10, 29, 42),
                         lag   = c(-1,  0,  1,  1,  2))

find.combo(m, pattern.df)
# [1] 57

Edit2: following the OP's edit regarding a GUI input, here is a function that transforms the GUI input into the pattern.df my find.combo function expects:

convert.gui.input <- function(string) {
   rows   <- strsplit(string, ";")[[1]]
   values <- strsplit(rows,   ",")
   data.frame(value = as.numeric(unlist(values)),
              lag = rep(seq_along(values), sapply(values, length)) - 1)
}

Tested here:

find.combo(m, convert.gui.input("37;10,29;42"))
# [1] 57
like image 181
flodel Avatar answered Oct 20 '22 02:10

flodel