I am currently searching for a way to find patterns in a list of integers, but the method I am going to use would be applicable to strings and other lists with different elements of course. Now let me explain what I am looking for.
I want to find the longest repeating pattern in a list of integers. For example,
[1, 2, 3, 4, 1, 2, 3]
# This list would give 1, 2, 3
Overlapping patterns should be discarded. ( Not certain )
[1, 1, 1, 1, 1]
# Should give 1, 1 Not 1, 1, 1, 1
Here is what did not help me.
Finding patterns in a list (Did not understand the logic behind the first answer, very little explanation. And second answer solves the problem only if the pattern is known before solving.)
Finding integer pattern from a list (Pattern is given and number of occurrence is wanted. Different than my question.)
Longest common subsequence problem (Most people dealed with this problem however it is not close to mine. I need consecutive elements while searching for a pattern. However in this, seperate elements also counted as subsequences.)
Here what I tried.
def pattern(seq):
n = len(seq)
c = defaultdict(int) # Counts of each subsequence
for i in xrange(n):
for j in xrange(i + 1, min(n, n / 2 + i)):
# Used n / 2 because I figured if a pattern is being searched
# It cant be longer that the half of the list.
c[tuple(seq[i:j])] += 1
return c
As you see, It finds all the sublists and check for repeats. I found this approach a bit naive(and inefficient) and I am in need of a better way. Please help me. Thanks in advance.
Note1: The list is predetermined but because of my algorithms failure, I can only check some parts of the list before freezing the computer. So the pattern I am trying to find can very well be longer than the half of the search list, It can even be longer than the search list itself because I am searching only a part of the original list.If you present a better method than I am using, I can search a larger part of the original list so I will have a better chance at finding the pattern. (If there is one.)
Note2: Here is a part of the list if you want to test it yourself. It really seems like that there is a pattern, but I cannot be sure before I test it with a reliable code. Sample List
Note3: I approach this as a serious problem of data mining. And will try to learn if you make a long explanation. This feels like a much more important problem than LCS, however LCS is much more popular :D This method I am trying to find feels like the methods scientists use to find DNA patterns.
Method : Using join regex + loop + re.match() This task can be performed using combination of above functions. In this, we create a new regex string by joining all the regex list and then match the string against it to check for match using match() with any of the element of regex list.
Ignoring the "no overlapping" requirement, here's the code I used:
import collections
def pattern(seq):
storage = {}
for length in range(1,int(len(seq)/2)+1):
valid_strings = {}
for start in range(0,len(seq)-length+1):
valid_strings[start] = tuple(seq[start:start+length])
candidates = set(valid_strings.values())
if len(candidates) != len(valid_strings):
print("Pattern found for " + str(length))
storage = valid_strings
else:
print("No pattern found for " + str(length))
break
return set(v for v in storage.values() if list(storage.values()).count(v) > 1)
Using that, I found 8 distinct patterns of length 303 in your dataset. The program ran pretty fast, too.
define patterns(sequence):
list_of_substrings = {}
for each valid length: ### i.e. lengths from 1 to half the list's length
generate a dictionary my_dict of all sub-lists of size length
if there are repeats:
list_of_substrings = my_dict
else:
return all repeated values in list_of_substrings
return list_of_substrings #### returns {} when there are no patterns
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With