Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if list is valid sequence of chunks

I want to check whether a list is a valid sequence of chunks, where each chunk begins with some value and ends with the next occurrence of the same value. For example, this is a valid sequence of three chunks:

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]
       \___________/  \_____/  \_______________________/

And this is one is invalid:

lst = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]
       \___________/  \_____/  \_____ ... missing the 2 to end the chunk

I have a solution but it's bad. Do you see something better?

def is_valid(lst):
    while lst:
        start = lst.pop(0)
        if start not in lst:
            return False
        while lst[0] != start:
            lst.pop(0)
        lst.remove(start)
    return True

# Tests, should print: True, False, True, False, True
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4, 5, 2]))
print(is_valid([2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4, 5, 9, 0, 4]))
print(is_valid(['I', 'N', 'O', 'A', 'I', 'L', 'L', 'T', 'R', 'X', 'I', 'I', 'N', 'X', 'F', 'T']))
print(is_valid(['T', 'I', 'N', 'I', 'X', 'R', 'O', 'F', 'T', 'I', 'N', 'I', 'X', 'L', 'L', 'A']))
print(is_valid([]))
like image 968
no comment Avatar asked Sep 07 '21 18:09

no comment


Video Answer


4 Answers

How about this, creating an iter from the list and searching forward on that iter until the next matching element is found. Note that this might fail is None can be an element of the list; then you should rather define and compare against a sentinel obj = object().

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if next((y for y in it if y == x), None) is None:
            return False
    return True

Since we don't actually need the value returned by next, we can also just use any instead, at the same time solving the problem of the default element. Like next, any will consume the iterator just as far as the matching element, if any:

def is_valid(lst):
    it = iter(lst)
    for x in it:
        if not any(y == x for y in it):
            return False
    return True

This can be further shortened using all instead of the outer for loop:

def is_valid(lst):
    it = iter(lst)
    return all(any(y == x for y in it) for x in it)

And this can finally be reduced to the equally cryptic and intriguing:

def is_valid(lst):
    it = iter(lst)
    return all(x in it for x in it)

Each way, each element is visited exactly once, the original list is not changed, little to no extra space, and IMHO it's even somewhat easy to read and understand.


This never was about speed, but anyway: Here are some benchmarks of the different solutions (and some more variations), running the test cases from the question as well as two random lists of 1,000 integers, one valid and one invalid, 10,000 times, on Python 3.8.10:

# with long lists             # only short test lists
1.52 is_valid_index           0.22 is_valid_index
3.28 is_valid_next            0.30 is_valid_next
2.78 is_valid_for_for_else    0.13 is_valid_for_for_else
5.26 is_valid_for_any         0.32 is_valid_for_any
5.29 is_valid_all_any         0.38 is_valid_all_any
3.42 is_valid_all_any_if      0.36 is_valid_all_any_if
2.02 is_valid_all_in          0.18 is_valid_all_in
1.97 is_valid_all_in_if       0.17 is_valid_all_in_if
1.87 is_valid_for_in          0.11 is_valid_for_in

Of course, all are O(n). With the long 1000-element-lists, the solution using index is fastest, but the one with x in it is not too bad, either. The any solutions lag somewhat behind, but are about as fast (or slow) as next when using a generator with condition, but still slower than when using plain for loops. With only the short test-lists, it's a bit different: Here, the solutions using one iterator and for-for-else and for-in are fastest by quite some margin.

like image 79
tobias_k Avatar answered Oct 19 '22 17:10

tobias_k


Here's my take on the problem. I've optimized for readability, not speed (while keeping it in O(n) of course):

def is_valid(sequence):
    iterator = iter(sequence)
    for element in iterator:
        for other in iterator:
            if element == other:
                break
        else:
            return False
    return True

Each iteration of the outer loop corresponds to a chunk. When we're out of elements here we ended the sequence at a chunk border, and we can return True. Otherwise, we loop through the iterator until we find a matching element. If we run out of elements (a for-loop that "naturally" terminates, without break, goes into its else) we return False.


And here's another one using itertools. I would not prefer it over the solution above, mostly because of the arcane use of next with a sentinel:

from itertools import dropwhile

def is_valid(iterable):
    iterator = iter(iterable)
    sentinel = object()
    for element in iterator:
        if next(dropwhile(lambda x: x != element, iterator), sentinel) is sentinel:
            return False
    return True
like image 27
L3viathan Avatar answered Oct 19 '22 17:10

L3viathan


Mutating the list with pop(0) is costly and not needed.

You could use index... this may be particularly fast when the chunks are large:

def is_valid(lst):
    i = 0
    n = len(list)
    while i < n:
        try:
            i = lst.index(lst[i], i + 1) + 1
        except:
            return False
    return True
like image 3
trincot Avatar answered Oct 19 '22 17:10

trincot


It seems like you want to make sure the last "chunk" is closed at the end of the list. This ought to do that:

def is_valid(lst):
  search = None
  paired = True
  for item in lst:
    if paired:
      search = item
      paired = False
    elif search == item:
      paired = True
  return paired

This is O(n), checks each element only one time, so you won't pay a cost for your start not in lst check that is expensive for long input lists.

like image 3
g.d.d.c Avatar answered Oct 19 '22 18:10

g.d.d.c