Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python groupby behaviour?

>>from itertools import groupby
>>keyfunc = lambda x : x > 500
>>obj = dict(groupby(range(1000), keyfunc))
>>list(obj[True])
[999]
>>list(obj[False])
[]

range(1000) is obviously sorted by default for the condition (x > 500).
I was expecting the numbers from 0 to 999 to be grouped in a dict by the condition (x > 500). But the resulting dictionary had only 999.
where are the other numbers?. Can any one explain what is happening here?

like image 882
nathan Avatar asked Jun 04 '11 10:06

nathan


People also ask

What does DF groupby ([ genre ]) do?

groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.

Is groupby faster on index pandas?

Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython.

Does pandas groupby keep order?

Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.

What kind of object does groupby return?

GroupBy objects are returned by groupby calls: pandas. DataFrame.


1 Answers

From the docs:

The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list[.]

And you are storing iterators in obj and materializing them later.

In [21]: dict((k, list(g)) for k, g in groupby(range(10), lambda x : x > 5))
Out[21]: {False: [0, 1, 2, 3, 4, 5], True: [6, 7, 8, 9]}
like image 136
wRAR Avatar answered Sep 19 '22 22:09

wRAR