Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas, groupby where column value is greater than x

Tags:

python

pandas

I have a table like this

    timestamp   avg_hr  hr_quality  avg_rr  rr_quality  activity    sleep_summary_id

    1422404668  66      229             0       0           13              78
    1422404670  64      223             0       0           20              78
    1422404672  64      216             0       0           11              78
    1422404674  66      198             0       40          9               78
    1422404676  65      184             0       30          3               78
    1422404678  64      173             0       10          17              78
    1422404680  66      199             0       20          118             78

I'm trying to group the data by timestamp,sleep id and rr_quality, where rr_quality is > 0

I've tried the following and none of them seems to work

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',df2['rr_quality']>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',['rr_quality']>0])

All of them returns a keyerror.

EDIT:

Also can't seem to be able to pass more than one filter at a time. I tried the following:

df2[df2['rr_quality'] >= 150, df2['hr_quality'] > 200]
df2[df2['rr_quality'] >= 150, ['hr_quality'] > 200]
df2[[df2['rr_quality'] >= 150, ['hr_quality'] > 200]]

returns: TypeError: 'Series' objects are mutable, thus they cannot be hashed

like image 870
cyberbemon Avatar asked Apr 14 '15 16:04

cyberbemon


People also ask

How do you groupby a specific value in Python?

groupby() and pass the name of the column that you want to group on, which is "state" . Then, you use ["last_name"] to specify the columns on which you want to perform the actual aggregation. You can pass a lot more than just a single column name to . groupby() as the first argument.

How do you use greater than or equal to in pandas?

Pandas DataFrame: ge() functionThe ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison. Any single or multiple element data structure, or list-like object.

What is the difference between groupby and Pivot_table in pandas?

What is the difference between the pivot_table and the groupby? The groupby method is generally enough for two-dimensional operations, but pivot_table is used for multi-dimensional grouping operations.


2 Answers

I know this is old but I wanted to add that there is an official function to do exactly this. Transforming the example from pandas to your case:

grouped_df2= df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'])
grouped_df2.filter(lambda x: x['rr_quality'] > 0.)
like image 85
Czarking Avatar answered Oct 27 '22 08:10

Czarking


the simplest thing to do here is to filter the df first and then perform the groupby:

df2[df2['rr_quality'] > 0].groupby([df2.index.hour,'sleep_summary_id'])

EDIT

If you're intending to assign this back to your original df:

df2.loc[df2['rr_quality'] > 0, 'AVG_HR'] = df2[df2['rr_quality'] >= 150].groupby([df2.index.hour,'emfit_sleep_summary_id'])['avg_hr'].transform('mea‌​n')

The loc call will mask the lhs so that the result of the transform aligns correctly

To filter using multiple conditions you need to use the array comparision operators &, | and ~ for and, or and not respectively, additionally you need to wrap the conditions in parentheses due to operator precedence:

df2[(df2['rr_quality'] >= 150) & (df2['hr_quality'] > 200)]
like image 40
EdChum Avatar answered Oct 27 '22 09:10

EdChum