You can also reset_index() on your groupby result to get back a dataframe with the name column now accessible. If you perform an operation on a single column the return will be a series with multiindex and you can simply apply pd. DataFrame to it and then reset_index. Show activity on this post.
You can group DataFrame rows into a list by using pandas. DataFrame. groupby() function on the column of interest, select the column you want as a list from group and then use Series. apply(list) to get the list for every group.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
You can use the get_group
method:
In [21]: gb.get_group('foo')
Out[21]:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
Note: This doesn't require creating an intermediary dictionary / copy of every subdataframe for every group, so will be much more memory-efficient than creating the naive dictionary with dict(iter(gb))
. This is because it uses data-structures already available in the groupby object.
You can select different columns using the groupby slicing:
In [22]: gb[["A", "B"]].get_group("foo")
Out[22]:
A B
0 foo 1.624345
2 foo -0.528172
4 foo 0.865408
In [23]: gb["C"].get_group("foo")
Out[23]:
0 5
2 11
4 14
Name: C, dtype: int64
Wes McKinney (pandas' author) in Python for Data Analysis provides the following recipe:
groups = dict(list(gb))
which returns a dictionary whose keys are your group labels and whose values are DataFrames, i.e.
groups['foo']
will yield what you are looking for:
A B C
0 foo 1.624345 5
2 foo -0.528172 11
4 foo 0.865408 14
Rather than
gb.get_group('foo')
I prefer using gb.groups
df.loc[gb.groups['foo']]
Because in this way you can choose multiple columns as well. for example:
df.loc[gb.groups['foo'],('A','B')]
gb = df.groupby(['A'])
gb_groups = grouped_df.groups
If you are looking for selective groupby objects then, do: gb_groups.keys(), and input desired key into the following key_list..
gb_groups.keys()
key_list = [key1, key2, key3 and so on...]
for key, values in gb_groups.iteritems():
if key in key_list:
print df.ix[values], "\n"
I was looking for a way to sample a few members of the GroupBy obj - had to address the posted question to get this done.
some_key
columngrouped = df.groupby('some_key')
sampled_df_i = random.sample(grouped.indices, N)
df_list = map(lambda df_i: grouped.get_group(df_i), sampled_df_i)
sampled_df = pd.concat(df_list, axis=0, join='outer')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With