resample irregularly spaced data in pandas

Tags:

Is it somehow possible to use resample on irregularly spaced data? (I know that the documentation says it's for "resampling of regular time-series data", but I wanted to try if it works on irregular data, too. Maybe it doesn't, or maybe I am doing something wrong.)

In my real data, I have generally 2 samples per hour, the time difference between them ranging usually from 20 to 40 minutes. So I was hoping to resample them to a regular hourly series.

To test if I am using it right, I used some random list of dates that I already had, so it may not be a best example but at least a solution that works for it will be very robust. here it is:

    fraction  number                time
0   0.729797       0 2014-10-23 15:44:00
1   0.141084       1 2014-10-30 19:10:00
2   0.226900       2 2014-11-05 21:30:00
3   0.960937       3 2014-11-07 05:50:00
4   0.452835       4 2014-11-12 12:20:00
5   0.578495       5 2014-11-13 13:57:00
6   0.352142       6 2014-11-15 05:00:00
7   0.104814       7 2014-11-18 07:50:00
8   0.345633       8 2014-11-19 13:37:00
9   0.498004       9 2014-11-19 22:47:00
10  0.131665      10 2014-11-24 15:28:00
11  0.654018      11 2014-11-26 10:00:00
12  0.886092      12 2014-12-04 06:37:00
13  0.839767      13 2014-12-09 00:50:00
14  0.257997      14 2014-12-09 02:00:00
15  0.526350      15 2014-12-09 02:33:00

Now I want to resample these for example monthly:

df_new = df.set_index(pd.DatetimeIndex(df['time']))
df_new['fraction'] = df.fraction.resample('M',how='mean')
df_new['number'] = df.number.resample('M',how='mean')

But I get TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex' - unless I did something wrong with assigning the datetime index, it must be due to the irregularity?

So my questions are:

Am I using it correctly?
If 1==True, is there no straightforward way to resample the data?

(I only see a solution in first reindexing the data to get finer intervals, interpolate the values in between and then reindexing it to hourly interval. If it is so, then a question regarding the correct implementation of reindex will follow shortly.)

557

asked Dec 13 '16 00:12

durbachit

1 Answers

You don't need to explicitly use DatetimeIndex, just set 'time' as the index and pandas will take care of the rest, so long as your 'time' column has been converted to datetime using pd.to_datetime or some other method. Additionally, you don't need to resample each column individually if you're using the same method; just do it on the entire DataFrame.

# Convert to datetime, if necessary.
df['time'] = pd.to_datetime(df['time'])

# Set the index and resample (using month start freq for compact output).
df = df.set_index('time')
df = df.resample('MS').mean()

The resulting output:

            fraction  number
time                        
2014-10-01  0.435441     0.5
2014-11-01  0.430544     6.5
2014-12-01  0.627552    13.5

answered Oct 31 '22 02:10

root

Related questions
                            
                                Django - distinct rows/objects distinguished by date/day from datetime field
                            
                                Find indices of a list of values in a not sorted numpy array
                            
                                Using multiple cores with Python and Eventlet
                            
                                Data transfer between C++ and Python
                            
                                Pandas merge not keeping 'on' column
                            
                                Django static image not displaying
                            
                                How to find all the groups the user is a member? (LDAP)
                            
                                Using Pandas Styler
                            
                                pymouse.click not interfacing with other software
                            
                                Conditionally change color of files in QListView connected to QFileSystemModel
                            
                                hiding android keyboard in kivy
                            
                                Using downloaded NLTK data on AWS Elastic Beanstalk
                            
                                calling SQL functions from Blaze
                            
                                Python: Process file using multiple cores
                            
                                PyDev: Running code to interactive console
                            
                                AssertionError: 22 columns passed, passed data had 21 columns
                            
                                itertools product of python dictionary values
                            
                                Python: How to random shuffle a list where each variable will end up in a new place [duplicate]
                            
                                PyCharm: Storing variables in memory to be able to run code from a "checkpoint"
                            
                                Celery - Schedule periodic task at the end of another task

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

resample irregularly spaced data in pandas

Tags:

python

pandas

durbachit

People also ask

1 Answers

root

Recent Activity

Donate For Us