I keep thinking there should be a function for this, but I've searched the likely places (google, itertools docs, list methods, other SO questions), but nowhere found quite what I was looking for.
Naive and working implementation:
def split_at_first_false(pred, seq):
first = []
second = []
true_so_far = True
for item in seq:
if true_so_far and pred(item):
first.append(item)
else:
true_so_far = False
second.append(item)
return first, second
print split_at_first_false(str.isalpha, "abc1a2b")
# (['a', 'b', 'c'], ['1', 'a', '2', 'b'])
It works, but it doesn't feel right. There should be a better way to do this!
EDIT: I ended up with using a slightly modified version of senderle's final suggestion after reviewing the answers:
from itertools import chain
def split_at_pred(pred, seq):
head = []
it = iter(seq)
for i in it:
if not pred(i):
head.append(i)
else:
return iter(head), chain([i], it)
return iter(head), iter([])
It's short and elegant, output is two iterators no matter the input (strings, lists, iterators), and as a bonus, it even works with the following input:
from itertools import count
split_at_pred(lambda x: x == 5, count())
The other solutions, those that work at all with iterators, will run out of memory with this input. (Note that this is just a bonus. Infinite iterators was something I hadn't even considered when I wrote this question)
This seems like a job for itertools.
>>> first = list(itertools.takewhile(str.isalpha, l))
>>> second = list(itertools.dropwhile(str.isalpha, l))
>>> first
['a', 'b', 'c']
>>> second
['1', 'a', '2', 'b']
This needs to be altered if l
is an iterator rather than a sequence.
>>> def bisect_iter(pred, i):
... i1, i2 = itertools.tee(i)
... return itertools.takewhile(pred, i1), itertools.dropwhile(pred, i2)
...
>>> i1, i2 = bisect_iter(str.isalpha, iter(l))
>>> list(i1)
['a', 'b', 'c']
>>> list(i2)
['1', 'a', '2', 'b']
The downside of tee
is that the initial values are cached and tested twice (by both takewhile
and dropwhile
). That's wasteful. But caching values is unavoidable if you want to both accept and return iterators.
However, if you can return lists from an iterator, I can think of one solution that doesn't make extra copies or tests, and it's very close to yours:
>>> def bisect_iter_to_list(pred, it):
... l1 = []
... for i in it:
... if pred(i):
... l1.append(i)
... else:
... l2 = [i]
... l2.extend(it)
... return l1, l2
...
>>> bisect_iter_to_list(str.isalpha, iter(l))
(['a', 'b', 'c'], ['1', 'a', '2', 'b'])
The only sneaky bit is that where there would normally be a break
statement (i.e. after the else
clause), I've simply consumed the iterator, causing the for
loop to terminate early.
Finally, if you still want to return iterators, but don't want to do extra tests, here's a variation on the above that I believe is optimal.
>>> def bisect_any_to_iter(pred, it):
... it = iter(it)
... head = []
... for i in it:
... if pred(i):
... head.append(i)
... else:
... tail = itertools.chain([i], it)
... break
... return iter(head), tail
...
>>> a, b = bisect_iter_to_iter(str.isalpha, iter(l))
>>> list(a)
['a', 'b', 'c']
>>> list(b)
['1', 'a', '2', 'b']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With