Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find smallest repeated piece of a list

I've got some list with integers like:

l1 = [8,9,8,9,8,9,8], 
l2 = [3,4,2,4,3]

My purpose to slice it into the smallest repeated piece. So:

output_l1 = [8,9]
output_l2 = [3,4,2,4]

Biggest problem that the sequences not fully finished every time. So not

'abcabcabc'

just

'abcabcab'.

like image 589
Tóth Tamás Avatar asked Oct 16 '22 07:10

Tóth Tamás


2 Answers

def shortest_repeating_sequence(inp):
    for i in range(1, len(inp)):
        if all(inp[j] == inp[j % i] for j in range(i, len(inp))):
            return inp[:i]

    # inp doesn't have a repeating pattern if we got this far
    return inp[:]

This code is O(n^2). The worst case is one element repeated a lot of times followed by something that breaks the pattern at the end, for example [1, 1, 1, 1, 1, 1, 1, 1, 1, 8].

You start with 1, and then iterate over the entire list checking that each inp[i] is equal to inp[i % 1]. Any number % 1 is equal to 0, so you're checking if each item in the input is equal to the first item in the input. If all items are equal to the first element then the repeating pattern is a list with just the first element so we return inp[:1].

If at some point you hit an element that isn't equal to the first element (all() stops as soon as it finds a False), you try with 2. So now you're checking if each element at an even index is equal to the first element (4 % 2 is 0) and if every odd index is equal to the second item (5 % 2 is 1). If you get all the way through this, the pattern is the first two elements so return inp[:2], otherwise try again with 3 and so on.

You could do range(1, len(inp)+1) and then the for loop will handle the case where inp doesn't contain a repeating pattern, but then you have to needlessly iterate over the entire inp at the end. And you'd still have to have to have return [] at the end to handle inp being the empty list.

I return a copy of the list (inp[:]) instead of the list to have consistent behavior. If I returned the original list with return inp and someone called that function on a list that didn't have a repeating pattern (ie their repeating pattern is the original list) and then did something with the repeating pattern, it would modify their original list as well.

shortest_repeating_sequence([4, 2, 7, 4, 6])  # no pattern
[4, 2, 7, 4, 6]
shortest_repeating_sequence([2, 3, 1, 2, 3])  # pattern doesn't repeat fully
[2, 3, 1]
shortest_repeating_sequence([2, 3, 1, 2])     # pattern doesn't repeat fully
[2, 3, 1]
shortest_repeating_sequence([8, 9, 8, 9, 8, 9, 8])
[8, 9]
shortest_repeating_sequence([1, 1, 1, 1, 1])
[1]
shortest_repeating_sequence([])
[]
like image 197
Boris Avatar answered Oct 21 '22 02:10

Boris


The following code is a rework of your solution that addresses some issues:

  1. Your solution as posted doesn't handle your own 'abcabcab' example.

  2. Your solution keeps processing even after it's found a valid result, and then filters through both the valid and non-valid results. Instead, once a valid result is found, we process and return it. Additional valid results, and non-valid results, are simply ignored.

  3. @Boris' issue regarding returning the input if there is no repeating pattern.

CODE

def repeated_piece(target):
    target = list(target)
    length = len(target)

    for final in range(1, length):
        result = []

        while len(result) < length:
            for i in target[:final]:
                result.append(i)

        if result[:length] == target:
            return result[:final]

    return target

l1 = [8, 9, 8, 9, 8, 9, 8]
l2 = [3, 4, 2, 4, 3]
l3 = 'abcabcab'
l4 = [1, 2, 3]

print(*repeated_piece(l1), sep='')
print(*repeated_piece(l2), sep='')
print(*repeated_piece(l3), sep='')
print(*repeated_piece(l4), sep='')

OUTPUT

% python3 test.py
89
3424
abc
123
%

You can still use:

print(''.join(map(str, repeated_piece(l1))))

if you're uncomfortable with the simpler Python 3 idiom:

print(*repeated_piece(l1), sep='')
like image 25
cdlane Avatar answered Oct 21 '22 03:10

cdlane