Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: arguments for using itertools to split a list into groups

Tags:

python

This is a question about the relative merits of fast code that uses the standard library but is obscure (at least to me) versus a hand-rolled alternative. In this thread (and others that it duplicates), it seems the "Pythonic" way to split a list into groups is to use itertools, as in the first function in the code example below (modified slightly from ΤΖΩΤΖΙΟΥ).

The reason I prefer the second function is that I can understand how it works, and if I don't need padding (turning a DNA sequence into codons, say), I can reproduce it from memory in an instant.

The speed is better with itertools. Particularly if we don't want a list back, or we want to pad the last entry, itertools is faster.

What other arguments are there in favor of the standard library solution?

from itertools import izip_longest

def groupby_itertools(iterable, n=3, padvalue='x'):
    "groupby_itertools('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

def groupby_my(L, n=3, pad=None):
    "groupby_my(list('abcde'), n=3, pad='x') --> [['a','b','c'], ['d','e','x']]"
    R = xrange(0,len(L),n)
    rL = [L[i:i+n] for i in R]
    if pad:
        last = rL[-1]
        x = n - len(last)
        if isinstance(last,list):
            rL[-1].extend([pad] * x)
        elif isinstance(last,str):
            rL[-1] += pad * x
    return rL

timing:

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_my(L)'
100000 loops, best of 3: 2.39 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_my(L[:-1],pad="x")'
100000 loops, best of 3: 4.67 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'groupby_itertools(L)'
1000000 loops, best of 3: 1.46 usec per loop

$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools;  L = list("abcdefghijk")' 'list(groupby_itertools(L))'
100000 loops, best of 3: 3.99 usec per loop

Edit: I would change the function names here (see Alex's answer), but there are so many I decided to post this warning instead.

like image 543
telliott99 Avatar asked Jan 19 '10 17:01

telliott99


2 Answers

When you reuse tools from the standard library, rather than "reinventing the wheel" by coding them yourself from scratch, you're not only getting well-optimized and tuned software (sometimes amazingly so, as often in the case of itertools components): more importantly, you're getting large amounts of functionality that you don't have to test, debug and maintain yourself -- you're leveraging all the testing, debugging and maintenance work of many splendid programmers who contribute to the standard library!

The investment in understanding what the standard library offers you therefore repays itself rapidly, and many times over -- and you'll be able to "reproduce from memory" just as well as for reinvented-wheel code, indeed probably better thanks to the higher amount of reuse.

By the way, the term "group by" has a well defined, idiomatic meaning for most programmers, thanks to its use in SQL (and the similar use in itertools itself): I would therefore suggest you avoid using it for something completely different -- that's only going to breed confusion any time you're collaborating with anybody else (hopefully often, since the heyday of the solo, "cowboy" programmer is long gone -- another argument in favor of standards and against wheel reinvention;-).

Lastly, your docstring doesn't match your functions' signature -- arguments-order error;-).

like image 162
Alex Martelli Avatar answered Sep 28 '22 15:09

Alex Martelli


Time spent learning the fundamentals of Python will pay off in spades later on. Therefore, learn itertools, and how groupby works. Not only is using itertools likely to be faster than any hand-rolled solutions, it will help you write better code in the future.

like image 30
unutbu Avatar answered Sep 28 '22 16:09

unutbu