Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all keys from GroupBy object in Pandas

Tags:

python

pandas

I'm looking for a way to get a list of all the keys in a GroupBy object, but I can't seem to find one via the docs nor through Google.

There is definitely a way to access the groups through their keys, like so:

df_gb = df.groupby(['EmployeeNumber']) df_gb.get_group(key) 

...so I figure there's a way to access a list (or the like) of the keys in a GroupBy object. I'm looking for something like this:

df_gb.keys Out: [1234, 2356, 6894, 9492] 

I figure I could just loop through the GroupBy object and get the keys that way, but I think there's got to be a better way.

like image 712
Nate Avatar asked Feb 28 '17 15:02

Nate


People also ask

How do you get index after Groupby pandas?

Python's groupby() function is versatile. It is used to split the data into groups based on some criteria like mean, median, value_counts, etc. In order to reset the index after groupby() we will use the reset_index() function.

How do I get Groupby columns in pandas?

You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.

What does Groupby return pandas?

Returns a groupby object that contains information about the groups. Convenience method for frequency conversion and resampling of time series. See the user guide for more detailed usage and examples, including splitting an object into groups, iterating through groups, selecting a group, aggregation, and more.


2 Answers

You can access this via attribute .groups on the groupby object, this returns a dict, the keys of the dict gives you the groups:

In [40]: df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)}) gp = df.groupby('group') gp.groups.keys()  Out[40]: dict_keys([0, 1, 2, 3]) 

here is the output from groups:

In [41]: gp.groups  Out[41]: {0: Int64Index([0], dtype='int64'),  1: Int64Index([1, 2, 3], dtype='int64'),  2: Int64Index([4, 5], dtype='int64'),  3: Int64Index([6, 7, 8], dtype='int64')} 

Update

it looks like that because the type of groups is a dict then the group order isn't maintained when you call keys:

In [65]: df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)}) gp = df.groupby('group') gp.groups.keys()  Out[65]: dict_keys(['b', 'e', 'g', 'a', 'x']) 

if you call groups you can see the order is maintained:

In [79]: gp.groups  Out[79]: {'a': Int64Index([2, 3, 4], dtype='int64'),  'b': Int64Index([0, 5, 8], dtype='int64'),  'e': Int64Index([7], dtype='int64'),  'g': Int64Index([1], dtype='int64'),  'x': Int64Index([6], dtype='int64')} 

then the key order is maintained, a hack around this is to access the .name attribute of each group:

In [78]: gp.apply(lambda x: x.name)  Out[78]: group a    a b    b e    e g    g x    x dtype: object 

which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:

In [81]: agg = gp.sum() agg  Out[81]:        val group      a        9 b       13 e        7 g        1 x        6  In [83]:     agg.index.get_level_values(0)  Out[83]: Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group') 
like image 126
EdChum Avatar answered Sep 19 '22 13:09

EdChum


A problem with EdChum's answer is that getting keys by launching gp.groups.keys() first constructs the full group dictionary. On large dataframes, this is a very slow operation, which effectively doubles the memory consumption. Iterating is waaay faster:

df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)}) gp = df.groupby('group') keys = [key for key, _ in gp] 

Executing this list comprehension took me 16 s on my groupby object, while I had to interrupt gp.groups.keys() after 3 minutes.

like image 23
Dr_Zaszuś Avatar answered Sep 23 '22 13:09

Dr_Zaszuś