Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

resample irregularly spaced data in pandas

Tags:

python

pandas

Is it somehow possible to use resample on irregularly spaced data? (I know that the documentation says it's for "resampling of regular time-series data", but I wanted to try if it works on irregular data, too. Maybe it doesn't, or maybe I am doing something wrong.)

In my real data, I have generally 2 samples per hour, the time difference between them ranging usually from 20 to 40 minutes. So I was hoping to resample them to a regular hourly series.

To test if I am using it right, I used some random list of dates that I already had, so it may not be a best example but at least a solution that works for it will be very robust. here it is:

    fraction  number                time
0   0.729797       0 2014-10-23 15:44:00
1   0.141084       1 2014-10-30 19:10:00
2   0.226900       2 2014-11-05 21:30:00
3   0.960937       3 2014-11-07 05:50:00
4   0.452835       4 2014-11-12 12:20:00
5   0.578495       5 2014-11-13 13:57:00
6   0.352142       6 2014-11-15 05:00:00
7   0.104814       7 2014-11-18 07:50:00
8   0.345633       8 2014-11-19 13:37:00
9   0.498004       9 2014-11-19 22:47:00
10  0.131665      10 2014-11-24 15:28:00
11  0.654018      11 2014-11-26 10:00:00
12  0.886092      12 2014-12-04 06:37:00
13  0.839767      13 2014-12-09 00:50:00
14  0.257997      14 2014-12-09 02:00:00
15  0.526350      15 2014-12-09 02:33:00

Now I want to resample these for example monthly:

df_new = df.set_index(pd.DatetimeIndex(df['time']))
df_new['fraction'] = df.fraction.resample('M',how='mean')
df_new['number'] = df.number.resample('M',how='mean')

But I get TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex' - unless I did something wrong with assigning the datetime index, it must be due to the irregularity?

So my questions are:

  1. Am I using it correctly?
  2. If 1==True, is there no straightforward way to resample the data?

(I only see a solution in first reindexing the data to get finer intervals, interpolate the values in between and then reindexing it to hourly interval. If it is so, then a question regarding the correct implementation of reindex will follow shortly.)

like image 557
durbachit Avatar asked Dec 13 '16 00:12

durbachit


People also ask

What is the use of resample in pandas?

Pandas dataframe.resample() function is primarily used for time series data. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

What is pandas time series Dataframe?

Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.resample () function is primarily used for time series data. A time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time.

What is resampling in time series data?

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.

How to upsample from monthly to daily frequency in pandas?

We would have to upsample the frequency from monthly to daily and use an interpolation scheme to fill in the new daily frequency. The Pandas library provides a function called resample () on the Series and DataFrame objects. This can be used to group records when downsampling and making space for new observations when upsampling.


1 Answers

You don't need to explicitly use DatetimeIndex, just set 'time' as the index and pandas will take care of the rest, so long as your 'time' column has been converted to datetime using pd.to_datetime or some other method. Additionally, you don't need to resample each column individually if you're using the same method; just do it on the entire DataFrame.

# Convert to datetime, if necessary.
df['time'] = pd.to_datetime(df['time'])

# Set the index and resample (using month start freq for compact output).
df = df.set_index('time')
df = df.resample('MS').mean()

The resulting output:

            fraction  number
time                        
2014-10-01  0.435441     0.5
2014-11-01  0.430544     6.5
2014-12-01  0.627552    13.5
like image 77
root Avatar answered Oct 31 '22 02:10

root