Pandas Datetime Interval Resample to Seconds

Tags:

python

pandas

Given the following dataframe:

import pandas as pd

pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
         "end":    ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
         "status": ["OK", "ERROR", "OK"]})

HAVE:

| start               | end                 | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK     |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR  | 
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK     |

I want to convert it to another format, that is, "unfold" the intervals and make them into a DatetimeIndex, and resample the data. The result should look like this:

WANT:

|                     | status    |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK        |
| 2017-01-01 13:09:02 | OK        |
| 2017-01-01 13:09:03 | OK        |
| 2017-01-01 13:09:04 | OK        |
| 2017-01-01 13:09:05 | OK        |
| 2017-01-01 13:09:06 | NAN       |
| 2017-01-01 13:09:07 | ERROR     |
| 2017-01-01 13:09:08 | ERROR     |
| 2017-01-01 13:09:09 | ERROR     |
| 2017-01-01 13:09:10 | NAN       |
| 2017-01-01 13:09:11 | NAN       |
| 2017-01-01 13:09:12 | OK        |
| 2017-01-01 13:09:13 | OK        |
| 2017-01-01 13:09:14 | OK        |

Any help is very much appreciated!

727

asked Jan 29 '18 15:01

r0f1

2 Answers

Using IntervalIndex:

# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')

# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')

# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)

The resulting output for result:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, Name: status, dtype: object

198

answered Oct 25 '22 05:10

root

Let's try this, using apply and rebuild series with date_range, then resample to fill missing time populated with NaN from asfreq:

df.apply(lambda x: pd.Series(index=pd.date_range(x['start'], 
                                                 x['end'],
                                                 freq='S'), 
                             data=x['status']), axis=1)\
  .T\
  .stack().reset_index(level=1, drop=True)\
  .resample('S').asfreq()

Output:

2017-01-01 13:09:01       OK
2017-01-01 13:09:02       OK
2017-01-01 13:09:03       OK
2017-01-01 13:09:04       OK
2017-01-01 13:09:05       OK
2017-01-01 13:09:06      NaN
2017-01-01 13:09:07    ERROR
2017-01-01 13:09:08    ERROR
2017-01-01 13:09:09    ERROR
2017-01-01 13:09:10      NaN
2017-01-01 13:09:11      NaN
2017-01-01 13:09:12       OK
2017-01-01 13:09:13       OK
2017-01-01 13:09:14       OK
Freq: S, dtype: object

answered Oct 25 '22 04:10

Scott Boston

Related questions
                            
                                Unknown PG numeric type 25
                            
                                AttributeError: 'Series' object has no attribute 'notna'
                            
                                "Installing From Source" Within Anaconda Environment
                            
                                Is python asyncio call_soon_threadsafe really thread-safe?
                            
                                django map widget console showing DjangoGooglePointFieldWidget Uncaught ReferenceError
                            
                                Python: ImportError: No module named _pluggy
                            
                                Python/Bash - Get filenames with escaped characters
                            
                                Are Pyspark and Pandas certified to work together? [closed]
                            
                                Python Pandas average based on condition into new column
                            
                                I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException
                            
                                Matplotlib Table- Assign different text alignments to different columns
                            
                                scikit learn: custom classifier compatible with GridSearchCV
                            
                                How can I overload operators so that type on the left/right does not matter?
                            
                                Sum of distances from a point to all other points
                            
                                OSRM giving wrong response for distance between 2 points
                            
                                Socket Java client - Python Server
                            
                                Understanding Scipy Convolution
                            
                                How to tell if a python module is intended to be python 2 or python 3?
                            
                                PyTorch: training with GPU gives worse error than training the same thing with CPU
                            
                                PySpark Numeric Window Group By

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With