>>from itertools import groupby >>keyfunc = lambda x : x > 500 >>obj = dict(groupby(range(1000), keyfunc)) >>list(obj[True]) [999] >>list(obj[False]) []
range(1000) is obviously sorted by default for the condition (x > 500).
I was expecting the numbers from 0 to 999 to be grouped in a dict by the condition (x > 500). But the resulting dictionary had only 999.
where are the other numbers?.
Can any one explain what is happening here?
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names.
Although Groupby is much faster than Pandas GroupBy. apply and GroupBy. transform with user-defined functions, Pandas is much faster with common functions like mean and sum because they are implemented in Cython.
Note this does not influence the order of observations within each group. Groupby preserves the order of rows within each group.
GroupBy objects are returned by groupby calls: pandas. DataFrame.
From the docs:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list[.]
And you are storing iterators in obj
and materializing them later.
In [21]: dict((k, list(g)) for k, g in groupby(range(10), lambda x : x > 5))
Out[21]: {False: [0, 1, 2, 3, 4, 5], True: [6, 7, 8, 9]}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With