Resampling in Pandas while keeping value associations

Tags:

Starting with something like this:

from pandas import DataFrame
time = np.array(('2015-08-01T00:00:00','2015-08-01T12:00:00'),dtype='datetime64[ns]')
heat_index = np.array([101,103])
air_temperature = np.array([96,95])

df = DataFrame({'heat_index':heat_index,'air_temperature':air_temperature},index=time)

yielding this for df:

                     air_temperature    heat_index
2015-08-01 07:00:00  96                 101
2015-08-01 19:00:00  95                 103

then resample daily:

df_daily = df.resample('24H',how='max')

To get this for df_daily:

            air_temperature     heat_index
2015-08-01  96                  103

So by resampling using how='max' pandas resamples each 24 hour period, taking the maximum value within that period from each column.

But as you can see looking at df output for 2015-08-01, that day's maximum heat index (which occurs at 19:00:00) does not correlate with air temperature occurred at the same time. That is, the heat index of 103F was caused with an air temperature of 95F. This association is lost through resampling, and we end up looking at the air temperature from a different part of the day.

Is there a way to resample just one column, and preserve the value in another column at the same index? So that the final outcome would look like this:

            air_temperature     heat_index
2015-08-01  95                  103

My first guess is to just resample the heat_index column...

df_daily = df.resample('24H',how={'heat_index':'max'})

to get...

            air_temperature
2015-08-01  103

...and then trying to do some sort of DataFrame.loc or DataFrame.ix from there, but have been unsuccessful. Any thoughts on how to find the related value after resampling (e.g. to find the air_temperature that occurred at the same time as what is later found to be the maximum heat_index)?

519

asked Aug 12 '15 22:08

csg2136

1 Answers

Here's one way - the .groupby(TimeGrouper()) is essentially what resample is doing, then the aggregation function filters each group to the max observation.

In [60]: (df.groupby(pd.TimeGrouper('24H'))
            .agg(lambda df: df.loc[df['heat_index'].idxmax(), :]))

Out[60]: 
            air_temperature  heat_index
2015-08-01               95         103

122

answered Nov 03 '22 20:11

chrisb

Related questions
                            
                                Confused about X in GaussianHMM.fit([X])
                            
                                openpyxl library - jdcal error
                            
                                SQLAlchemy - Handling Constraint Failures
                            
                                Way to run Maven from Python script?
                            
                                Scrapy Crawled 0 pages (at 0 pages/min)
                            
                                Shuffling multiple HDF5 datasets in-place
                            
                                Differences between enumerate(fileinput.input(file)) and enumerate(file)
                            
                                Heroku. New Relic Procfile command doesn't work
                            
                                Parse SQL Script to extract table and column names
                            
                                Count occurrences of digit 'x' in range (0,n]
                            
                                Selenium: Run test on my machine remotely?
                            
                                How to install a Python Windows service using cx_Freeze?
                            
                                Filter and Sort on Custom Field in Flask-admin ModelView
                            
                                Set space between boxplots in Python Graphs generated nested box plots with Seaborn?
                            
                                What can I do to speed up Stanford CoreNLP (dcoref/ner)?
                            
                                numpy array from csv file for lasagne
                            
                                Python: How to replace text in pdf
                            
                                How to get PyQt4 working with PyCharm
                            
                                Is there a way to access a function's attributes/parameters within a ContextDecorator?
                            
                                numpy "Mean of empty slice." warning

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Resampling in Pandas while keeping value associations

Tags:

python

datetime

pandas

csg2136

People also ask

1 Answers

chrisb

Recent Activity

Donate For Us