This is a question about the relative merits of fast code that uses the standard library but is obscure (at least to me) versus a hand-rolled alternative. In this thread (and others that it duplicates), it seems the "Pythonic" way to split a list into groups is to use itertools, as in the first function in the code example below (modified slightly from ΤΖΩΤΖΙΟΥ).
The reason I prefer the second function is that I can understand how it works, and if I don't need padding (turning a DNA sequence into codons, say), I can reproduce it from memory in an instant.
The speed is better with itertools. Particularly if we don't want a list back, or we want to pad the last entry, itertools is faster.
What other arguments are there in favor of the standard library solution?
from itertools import izip_longest
def groupby_itertools(iterable, n=3, padvalue='x'):
"groupby_itertools('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def groupby_my(L, n=3, pad=None):
"groupby_my(list('abcde'), n=3, pad='x') --> [['a','b','c'], ['d','e','x']]"
R = xrange(0,len(L),n)
rL = [L[i:i+n] for i in R]
if pad:
last = rL[-1]
x = n - len(last)
if isinstance(last,list):
rL[-1].extend([pad] * x)
elif isinstance(last,str):
rL[-1] += pad * x
return rL
timing:
$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools; L = list("abcdefghijk")' 'groupby_my(L)'
100000 loops, best of 3: 2.39 usec per loop
$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools; L = list("abcdefghijk")' 'groupby_my(L[:-1],pad="x")'
100000 loops, best of 3: 4.67 usec per loop
$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools; L = list("abcdefghijk")' 'groupby_itertools(L)'
1000000 loops, best of 3: 1.46 usec per loop
$ python -mtimeit -s 'from groups import groupby_my, groupby_itertools; L = list("abcdefghijk")' 'list(groupby_itertools(L))'
100000 loops, best of 3: 3.99 usec per loop
Edit: I would change the function names here (see Alex's answer), but there are so many I decided to post this warning instead.
When you reuse tools from the standard library, rather than "reinventing the wheel" by coding them yourself from scratch, you're not only getting well-optimized and tuned software (sometimes amazingly so, as often in the case of itertools
components): more importantly, you're getting large amounts of functionality that you don't have to test, debug and maintain yourself -- you're leveraging all the testing, debugging and maintenance work of many splendid programmers who contribute to the standard library!
The investment in understanding what the standard library offers you therefore repays itself rapidly, and many times over -- and you'll be able to "reproduce from memory" just as well as for reinvented-wheel code, indeed probably better thanks to the higher amount of reuse.
By the way, the term "group by" has a well defined, idiomatic meaning for most programmers, thanks to its use in SQL (and the similar use in itertools
itself): I would therefore suggest you avoid using it for something completely different -- that's only going to breed confusion any time you're collaborating with anybody else (hopefully often, since the heyday of the solo, "cowboy" programmer is long gone -- another argument in favor of standards and against wheel reinvention;-).
Lastly, your docstring doesn't match your functions' signature -- arguments-order error;-).
Time spent learning the fundamentals of Python will pay off in spades later on. Therefore, learn itertools, and how groupby works. Not only is using itertools likely to be faster than any hand-rolled solutions, it will help you write better code in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With