Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas groupby object filtering

i have a pandas dataframe

df.columns
Index([u’car_id’,u’color’,u’make’,u’year’)]

I would like to create a new FILTERABLE object that has the count of each group (color,make,year);

grp = df[[‘color’,’make’,’year’]].groupby([‘color’,’make’,’year’]).size()

which will return something like this

color   make   year     count
black   honda  2011   416

I would like to be able to filter it, however when I try this:

grp.filter(lambda x: x[‘color’]==‘black’)

I receive this error

TypeError: 'function' object is not iterable

How do I leverage a 'groupby' object in order to filter the rows out?

like image 794
chattrat423 Avatar asked Sep 12 '16 19:09

chattrat423


People also ask

How do you filter in groupby?

GROUP BY enables you to use aggregate functions on groups of data returned from a query. FILTER is a modifier used on an aggregate function to limit the values used in an aggregation. All the columns in the select statement that aren't aggregated should be specified in a GROUP BY clause in the query.

What is the difference between aggregating transforming and filtering data?

If you want to get a single value for each group, use aggregate() (or one of its shortcuts). If you want to get a subset of the original rows, use filter() . And if you want to get a new value for each original row, use transpose() .

How do I iterate over a group in pandas?

groupby() to Iterate over Data frame Groups. DataFrame. groupby() function in Python is used to split the data into groups based on some criteria.


2 Answers

I think you need add reset_index and then output is DataFrame. Last use boolean indexing:

df = df[['color','make','year']].groupby(['color','make','year'])
                                .size()
                                .reset_index(name='count')


df1 = df[df.color == 'black']
like image 183
jezrael Avatar answered Oct 21 '22 21:10

jezrael


Option 1
Filter ahead of time

cols = ['color','make','year']
df[df.color == 'black', cols].grouby(cols).size()

Option 2 Use xs for index cross sections

cols = ['color','make','year']
grp = df[cols].groupby(cols).size()

df.xs('black', level='color', drop_level=False)

or

df.xs('honda', level='make', drop_level=False)

or

df.xs(2011, level='year', drop_level=False)
like image 34
piRSquared Avatar answered Oct 21 '22 19:10

piRSquared