Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Datetime Interval Resample to Seconds

Tags:

python

pandas

Given the following dataframe:

import pandas as pd

pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
         "end":    ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
         "status": ["OK", "ERROR", "OK"]})

HAVE:

| start               | end                 | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK     |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR  | 
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK     |

I want to convert it to another format, that is, "unfold" the intervals and make them into a DatetimeIndex, and resample the data. The result should look like this:

WANT:

|                     | status    |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK        |
| 2017-01-01 13:09:02 | OK        |
| 2017-01-01 13:09:03 | OK        |
| 2017-01-01 13:09:04 | OK        |
| 2017-01-01 13:09:05 | OK        |
| 2017-01-01 13:09:06 | NAN       |
| 2017-01-01 13:09:07 | ERROR     |
| 2017-01-01 13:09:08 | ERROR     |
| 2017-01-01 13:09:09 | ERROR     |
| 2017-01-01 13:09:10 | NAN       |
| 2017-01-01 13:09:11 | NAN       |
| 2017-01-01 13:09:12 | OK        |
| 2017-01-01 13:09:13 | OK        |
| 2017-01-01 13:09:14 | OK        |

Any help is very much appreciated!

like image 727
r0f1 Avatar asked Jan 29 '18 15:01

r0f1


People also ask

How do I resample data in pandas?

Pandas Series: resample() functionThe resample() function is used to resample time-series data. Convenience method for frequency conversion and resampling of time series. Object must have a datetime-like index (DatetimeIndex, PeriodIndex, or TimedeltaIndex), or pass datetime-like values to the on or level keyword.

What does DF resample do?

Resampling generates a unique sampling distribution on the basis of the actual data. We can apply various frequency to resample our time series data. This is a very important technique in the field of analytics. There are many other types of time series frequency available.

How do you resample a dataset in Python?

resample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .


2 Answers

Using IntervalIndex:

# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')

# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')

# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)

The resulting output for result:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, Name: status, dtype: object
like image 198
root Avatar answered Oct 25 '22 05:10

root


Let's try this, using apply and rebuild series with date_range, then resample to fill missing time populated with NaN from asfreq:

df.apply(lambda x: pd.Series(index=pd.date_range(x['start'], 
                                                 x['end'],
                                                 freq='S'), 
                             data=x['status']), axis=1)\
  .T\
  .stack().reset_index(level=1, drop=True)\
  .resample('S').asfreq()

Output:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, dtype: object
like image 1
Scott Boston Avatar answered Oct 25 '22 04:10

Scott Boston