I am creating a groupby
object from a Pandas DataFrame
and want to select out all the groups with > 1 size.
Example:
A B 0 foo 0 1 bar 1 2 foo 2 3 foo 3
The following doesn't seem to work:
grouped = df.groupby('A') grouped[grouped.size > 1]
Expected Result:
A foo 0 2 3
To sort grouped dataframe in ascending or descending order, use sort_values(). The size() method is used to get the dataframe size.
If you want to get a single value for each group, use aggregate() (or one of its shortcuts). If you want to get a subset of the original rows, use filter() .
For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.
Sort within Groups of groupby() Result in DataFrameBy using DataFrame. sort_values() , you can sort DataFrame in ascending or descending order, before you use this first group the DataFrame rows by using DataFrame. groupby() method. Note that groupby preserves the order of rows within each group.
As of pandas 0.12 you can do:
>>> grouped.filter(lambda x: len(x) > 1) A B 0 foo 0 2 foo 2 3 foo 3
I have found transform
to be much more efficient than filter
for very large dataframes:
element_group_sizes = df['A'].groupby(df['A']).transform('size') df[element_group_sizes>1]
Or, in one line:
df[df['A'].groupby(df['A']).transform('size')>1]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With