Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding patterns in list

I am currently searching for a way to find patterns in a list of integers, but the method I am going to use would be applicable to strings and other lists with different elements of course. Now let me explain what I am looking for.

I want to find the longest repeating pattern in a list of integers. For example,

[1, 2, 3, 4, 1, 2, 3]
# This list would give 1, 2, 3

Overlapping patterns should be discarded. ( Not certain )

[1, 1, 1, 1, 1]
# Should give 1, 1  Not 1, 1, 1, 1

Here is what did not help me.

Finding patterns in a list (Did not understand the logic behind the first answer, very little explanation. And second answer solves the problem only if the pattern is known before solving.)

Finding integer pattern from a list (Pattern is given and number of occurrence is wanted. Different than my question.)

Longest common subsequence problem (Most people dealed with this problem however it is not close to mine. I need consecutive elements while searching for a pattern. However in this, seperate elements also counted as subsequences.)

Here what I tried.

def pattern(seq):
    n = len(seq)
    c = defaultdict(int) # Counts of each subsequence
    for i in xrange(n):
        for j in xrange(i + 1, min(n, n / 2 + i)): 
            # Used n / 2 because I figured if a pattern is being searched
            # It cant be longer that the half of the list.
            c[tuple(seq[i:j])] += 1      
    return c

As you see, It finds all the sublists and check for repeats. I found this approach a bit naive(and inefficient) and I am in need of a better way. Please help me. Thanks in advance.

Note1: The list is predetermined but because of my algorithms failure, I can only check some parts of the list before freezing the computer. So the pattern I am trying to find can very well be longer than the half of the search list, It can even be longer than the search list itself because I am searching only a part of the original list.If you present a better method than I am using, I can search a larger part of the original list so I will have a better chance at finding the pattern. (If there is one.)

Note2: Here is a part of the list if you want to test it yourself. It really seems like that there is a pattern, but I cannot be sure before I test it with a reliable code. Sample List

Note3: I approach this as a serious problem of data mining. And will try to learn if you make a long explanation. This feels like a much more important problem than LCS, however LCS is much more popular :D This method I am trying to find feels like the methods scientists use to find DNA patterns.

like image 370
Max Paython Avatar asked Aug 04 '16 14:08

Max Paython


People also ask

How do you match a pattern to a list in Python?

Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.


1 Answers

The Code (updated for Python 2 + 3)

Ignoring the "no overlapping" requirement, here's the code I used:

import collections
  
def pattern(seq):
        storage = {}
        for length in range(1,int(len(seq)/2)+1):
                valid_strings = {}
                for start in range(0,len(seq)-length+1):
                        valid_strings[start] = tuple(seq[start:start+length])
                candidates = set(valid_strings.values())
                if len(candidates) != len(valid_strings):
                        print("Pattern found for " + str(length))
                        storage = valid_strings
                else:
                        print("No pattern found for " + str(length))
                        break
        return set(v for v in storage.values() if list(storage.values()).count(v) > 1)

Using that, I found 8 distinct patterns of length 303 in your dataset. The program ran pretty fast, too.

Pseudocode Version

define patterns(sequence):
    list_of_substrings = {}
    for each valid length:  ### i.e. lengths from 1 to half the list's length
        generate a dictionary my_dict of all sub-lists of size length
        if there are repeats:
            list_of_substrings = my_dict
        else:
            return all repeated values in list_of_substrings
    return list_of_substrings  #### returns {} when there are no patterns
like image 137
Naveen Arun Avatar answered Oct 03 '22 19:10

Naveen Arun