Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast iterating over first n items of an iterable (not a list) in python

I'm looking for a pythonic way of iterating over first n items of an iterable (upd: not a list in a common case, as for lists things are trivial), and it's quite important to do this as fast as possible. This is how I do it now:

count = 0
for item in iterable:
 do_something(item)
 count += 1
 if count >= n: break

Doesn't seem neat to me. Another way of doing this is:

for item in itertools.islice(iterable, n):
    do_something(item)

This looks good, the question is it fast enough to use with some generator(s)? For example:

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)
for item in itertools.islice(pair_generator(iterable), n):
 so_something(item)

Will it run fast enough as compared to the first method? Is there some easier way to do it?

like image 389
martinthenext Avatar asked Apr 23 '10 21:04

martinthenext


People also ask

Is it faster to iterate through list or set Python?

We know how in python, set can be iterate faster than list. What's the reason behind it? Set is implemented by a hash-table data structure. For this reason, checking if a specific value exists in the set, is instant O(1) time, requires no iteration.

What are the methods used with an iterator to iterate through a sequence in Python?

__ier()__ and __next()__ methods are used with an iterator to iterate through the given sequence.


2 Answers

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. It works for arbitrary iterables and is O(n), like would be any sane solution.

It's conceivable that another solution could have better performance; we wouldn't know without timing. I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. Unless it's buries within an inner loop, it is highly doubtful that it will be. Premature optimization is the root of all evil.


If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) .... I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already.

These are things you only do after you've proven they'll help. People try doing them other times a lot. It doens't help make their programs appreciably faster; it just makes their programs worse.

like image 60
Mike Graham Avatar answered Oct 22 '22 20:10

Mike Graham


itertools tends to be the fastest solution, when directly applicable.

Obviously, the only way to check is to benchmark -- e.g., save in aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...:

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify.

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right.

like image 37
Alex Martelli Avatar answered Oct 22 '22 22:10

Alex Martelli