Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to split a list in two at the point where predicate is first False

Tags:

python

I keep thinking there should be a function for this, but I've searched the likely places (google, itertools docs, list methods, other SO questions), but nowhere found quite what I was looking for.

Naive and working implementation:

def split_at_first_false(pred, seq):
    first = []
    second = []
    true_so_far = True
    for item in seq:
        if true_so_far and pred(item):
            first.append(item)
        else:
            true_so_far = False
            second.append(item)
    return first, second

print split_at_first_false(str.isalpha, "abc1a2b")
# (['a', 'b', 'c'], ['1', 'a', '2', 'b'])

It works, but it doesn't feel right. There should be a better way to do this!

EDIT: I ended up with using a slightly modified version of senderle's final suggestion after reviewing the answers:

from itertools import chain

def split_at_pred(pred, seq):
    head = []
    it = iter(seq)
    for i in it:
        if not pred(i):
            head.append(i)
        else:
            return iter(head), chain([i], it)
    return iter(head), iter([])

It's short and elegant, output is two iterators no matter the input (strings, lists, iterators), and as a bonus, it even works with the following input:

from itertools import count
split_at_pred(lambda x: x == 5, count())

The other solutions, those that work at all with iterators, will run out of memory with this input. (Note that this is just a bonus. Infinite iterators was something I hadn't even considered when I wrote this question)

like image 774
Lauritz V. Thaulow Avatar asked Jul 13 '11 13:07

Lauritz V. Thaulow


1 Answers

This seems like a job for itertools.

>>> first = list(itertools.takewhile(str.isalpha, l))
>>> second = list(itertools.dropwhile(str.isalpha, l))
>>> first
['a', 'b', 'c']
>>> second
['1', 'a', '2', 'b']

This needs to be altered if l is an iterator rather than a sequence.

>>> def bisect_iter(pred, i):
...     i1, i2 = itertools.tee(i)
...     return itertools.takewhile(pred, i1), itertools.dropwhile(pred, i2)
... 
>>> i1, i2 = bisect_iter(str.isalpha, iter(l))
>>> list(i1)
['a', 'b', 'c']
>>> list(i2)
['1', 'a', '2', 'b']

The downside of tee is that the initial values are cached and tested twice (by both takewhile and dropwhile). That's wasteful. But caching values is unavoidable if you want to both accept and return iterators.

However, if you can return lists from an iterator, I can think of one solution that doesn't make extra copies or tests, and it's very close to yours:

>>> def bisect_iter_to_list(pred, it):
...     l1 = []
...     for i in it:
...         if pred(i):
...             l1.append(i)
...         else:
...             l2 = [i]
...             l2.extend(it)
...     return l1, l2
... 
>>> bisect_iter_to_list(str.isalpha, iter(l))
(['a', 'b', 'c'], ['1', 'a', '2', 'b'])

The only sneaky bit is that where there would normally be a break statement (i.e. after the else clause), I've simply consumed the iterator, causing the for loop to terminate early.

Finally, if you still want to return iterators, but don't want to do extra tests, here's a variation on the above that I believe is optimal.

>>> def bisect_any_to_iter(pred, it):
...     it = iter(it)
...     head = []
...     for i in it:
...         if pred(i):
...             head.append(i)
...         else:
...             tail = itertools.chain([i], it)
...             break
...     return iter(head), tail
... 
>>> a, b = bisect_iter_to_iter(str.isalpha, iter(l))
>>> list(a)
['a', 'b', 'c']
>>> list(b)
['1', 'a', '2', 'b']
like image 91
senderle Avatar answered Sep 19 '22 19:09

senderle