Pandas, groupby where column value is greater than x

Tags:

pandas

I have a table like this

    timestamp   avg_hr  hr_quality  avg_rr  rr_quality  activity    sleep_summary_id

    1422404668  66      229             0       0           13              78
    1422404670  64      223             0       0           20              78
    1422404672  64      216             0       0           11              78
    1422404674  66      198             0       40          9               78
    1422404676  65      184             0       30          3               78
    1422404678  64      173             0       10          17              78
    1422404680  66      199             0       20          118             78

I'm trying to group the data by timestamp,sleep id and rr_quality, where rr_quality is > 0

I've tried the following and none of them seems to work

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',df2['rr_quality']>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',['rr_quality']>0])

All of them returns a keyerror.

EDIT:

Also can't seem to be able to pass more than one filter at a time. I tried the following:

df2[df2['rr_quality'] >= 150, df2['hr_quality'] > 200]
df2[df2['rr_quality'] >= 150, ['hr_quality'] > 200]
df2[[df2['rr_quality'] >= 150, ['hr_quality'] > 200]]

returns: TypeError: 'Series' objects are mutable, thus they cannot be hashed

870

asked Apr 14 '15 16:04

2 Answers

I know this is old but I wanted to add that there is an official function to do exactly this. Transforming the example from pandas to your case:

grouped_df2= df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'])
grouped_df2.filter(lambda x: x['rr_quality'] > 0.)

answered Oct 27 '22 08:10

Czarking

the simplest thing to do here is to filter the df first and then perform the groupby:

df2[df2['rr_quality'] > 0].groupby([df2.index.hour,'sleep_summary_id'])

EDIT

If you're intending to assign this back to your original df:

df2.loc[df2['rr_quality'] > 0, 'AVG_HR'] = df2[df2['rr_quality'] >= 150].groupby([df2.index.hour,'emfit_sleep_summary_id'])['avg_hr'].transform('mea‌n')

The loc call will mask the lhs so that the result of the transform aligns correctly

To filter using multiple conditions you need to use the array comparision operators &, | and ~ for and, or and not respectively, additionally you need to wrap the conditions in parentheses due to operator precedence:

df2[(df2['rr_quality'] >= 150) & (df2['hr_quality'] > 200)]

answered Oct 27 '22 09:10

EdChum

Related questions
                            
                                Scaling of target causes Scikit-learn SVM regression to break down
                            
                                Matplotlib: How to get space between bars?
                            
                                Python: jsonschema package to validate schema and custom error messages
                            
                                Use Multiple Conditions in BeautifulSoup
                            
                                Is it possible to locate elements by CSS properties in Scrapy?
                            
                                Get textinput value in Kivy app
                            
                                Synchronous and blocking consumption in RabbitMQ using pika
                            
                                Intersection of two lists, keeping duplicates in the first list
                            
                                S3 using boto and SigV4 - missing host parameter
                            
                                Python Matplotlib line plot aligned with contour/imshow
                            
                                I need the server to send messages to all clients (Python, sockets)
                            
                                Issue when running schedule with Flask
                            
                                How do I save a workbook using xlwings?
                            
                                What should I use instead of Bootstrap?
                            
                                Filling date gaps in pandas dataframe
                            
                                MATLAB ksdensity equivalent in Python
                            
                                Pandas scalar value getting and setting: ix or iat?
                            
                                Python: Iterate over each item in nested-list-of-lists and replace specific items
                            
                                Why does this solve the 'no $DISPLAY environment' issue with matplotlib?
                            
                                Updating Anaconda's root Python to newer minor version on Windows does nothing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas, groupby where column value is greater than x

Tags:

python

pandas

cyberbemon

People also ask

2 Answers

Czarking

EdChum

Recent Activity

Donate For Us