I am experimenting with the groupby features of pandas, in particular <pre class="prettyprint"><code>gb = df.groupby('model') gb.hist() </code></pre> Since gb has 50 groups the result is quite cluttered, I would like to explore the result only for the first 5 groups. I found how to select a single group with <code>groups</code> or <code>get_group</code> (How to access pandas groupby dataframe by key), but not how to select multiple groups directly. The best I could do is : <pre class="prettyprint"><code>groups = dict(list(gb)) subgroup = pd.concat(groups.values()[:4]) subgroup.groupby('model').hist() </code></pre> Is there a more direct way?

You can do something like <pre class="prettyprint"><code>new_gb = pandas.concat( [ gb.get_group(group) for i,group in enumerate( gb.groups) if i < 5 ] ).groupby('model') new_gb.hist() </code></pre> Although, I would approach it differently. You can use the <code>collections.Counter</code> object to get groups fast: <pre class="prettyprint"><code>import collections df = pandas.DataFrame.from_dict({'model': pandas.np.random.randint(0, 3, 10), 'param1': pandas.np.random.random(10), 'param2':pandas.np.random.random(10)}) # model param1 param2 #0 2 0.252379 0.985290 #1 1 0.059338 0.225166 #2 0 0.187259 0.808899 #3 2 0.773946 0.696001 #4 1 0.680231 0.271874 #5 2 0.054969 0.328743 #6 0 0.734828 0.273234 #7 0 0.776684 0.661741 #8 2 0.098836 0.013047 #9 1 0.228801 0.827378 model_groups = collections.Counter(df.model) print(model_groups) #Counter({2: 4, 0: 3, 1: 3}) </code></pre> Now you can iterate over the <code>Counter</code> object like a dictionary, and query the groups you want: <pre class="prettyprint"><code>new_df = pandas.concat( [df.query('model==%d'%key) for key,val in model_groups.items() if val < 4 ] ) # for example, but you can select the models however you like # model param1 param2 #2 0 0.187259 0.808899 #6 0 0.734828 0.273234 #7 0 0.776684 0.661741 #1 1 0.059338 0.225166 #4 1 0.680231 0.271874 #9 1 0.228801 0.827378 </code></pre> Now you can use the built-in <code>pandas.DataFrame.groupby</code> function <pre class="prettyprint"><code>gb = new_df.groupby('model') gb.hist() </code></pre> Since <code>model_groups</code> contains all of the groups, you can just pick from it as you wish. <h3>note</h3> If your <code>model</code> column contains string values (names or something) instead of integers, it will all work the same - just change the query argument from <code>'model==%d'%key</code> to <code>'model=="%s"'%key</code>.

Select multiple groups from pandas groupby object

I am experimenting with the groupby features of pandas, in particular

gb = df.groupby('model')
gb.hist()

Since gb has 50 groups the result is quite cluttered, I would like to explore the result only for the first 5 groups.

I found how to select a single group with groups or get_group (How to access pandas groupby dataframe by key), but not how to select multiple groups directly. The best I could do is :

groups = dict(list(gb))
subgroup = pd.concat(groups.values()[:4])
subgroup.groupby('model').hist()

Is there a more direct way?

How do you get a group in a Groupby pandas?

By doing groupby() pandas returns you a dict of grouped DFs. You can easily get the key list of this dict by python built in function keys() . This is much more pandorable than other answers. :) groupby() does not return a dict , but a DataFrameGroupBy object.

You can do something like

new_gb = pandas.concat( [ gb.get_group(group) for i,group in enumerate( gb.groups) if i < 5 ] ).groupby('model')    
new_gb.hist()

Although, I would approach it differently. You can use the collections.Counter object to get groups fast:

import collections

df = pandas.DataFrame.from_dict({'model': pandas.np.random.randint(0, 3, 10), 'param1': pandas.np.random.random(10), 'param2':pandas.np.random.random(10)})
#   model    param1    param2
#0      2  0.252379  0.985290
#1      1  0.059338  0.225166
#2      0  0.187259  0.808899
#3      2  0.773946  0.696001
#4      1  0.680231  0.271874
#5      2  0.054969  0.328743
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#8      2  0.098836  0.013047
#9      1  0.228801  0.827378
model_groups = collections.Counter(df.model)
print(model_groups) #Counter({2: 4, 0: 3, 1: 3})

Now you can iterate over the Counter object like a dictionary, and query the groups you want:

new_df = pandas.concat( [df.query('model==%d'%key) for key,val in model_groups.items() if val < 4 ] ) # for example, but you can select the models however you like  
#   model    param1    param2
#2      0  0.187259  0.808899
#6      0  0.734828  0.273234
#7      0  0.776684  0.661741
#1      1  0.059338  0.225166
#4      1  0.680231  0.271874
#9      1  0.228801  0.827378

Now you can use the built-in pandas.DataFrame.groupby function

gb = new_df.groupby('model')
gb.hist()

Since model_groups contains all of the groups, you can just pick from it as you wish.

note

If your model column contains string values (names or something) instead of integers, it will all work the same - just change the query argument from 'model==%d'%key to 'model=="%s"'%key.

It'd be easier to just filter your df first and then perform the groupby:

In [155]:

df = pd.DataFrame({'model':np.random.randint(1,10,100), 'value':np.random.randn(100)})
first_five = df['model'].sort(inplace=False).unique()[:5]
gp = df[df['model'].isin(first_five)].groupby('model')
gp.first()
Out[155]:
          value
model          
1     -0.505677
2      1.217027
3     -0.641583
4      0.778104
5     -1.037858

Select multiple groups from pandas groupby object

Tags:

python

pandas

lib

People also ask

2 Answers

note

dermen

EdChum

Recent Activity

Donate For Us

Select multiple groups from pandas groupby object

Tags:

python

pandas

lib

People also ask

2 Answers

note

dermen

EdChum

Related questions

Recent Activity

Donate For Us