Find gaps in pandas time series dataframe sampled at 1 minute intervals and fill the gaps with new rows

Problem

I have a data frame containing financial data sampled at 1 minute intervals. Occasionally a row or two of data might be missing.

I'm looking for a good (simple and efficient) way to insert new rows into the dataframe at the points in which there is missing data.
The new rows can be empty except for the index, which contains the timestamp.

For example:

 #Example Input---------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645  

 # Desired Ouput--------------------------------------------
                      open     high     low      close
 2019-02-07 16:01:00  124.624  124.627  124.647  124.617  
 2019-02-07 16:02:00  NaN      NaN      NaN      NaN
 2019-02-07 16:03:00  NaN      NaN      NaN      NaN
 2019-02-07 16:04:00  124.646  124.655  124.664  124.645

My current method is based off this post - Find missing minute data in time series data using pandas - which is advises only how to identify the gaps. Not how to fill them.

What I'm doing is creating a DateTimeIndex of 1min intervals. Then using this index, I create an entirely new dataframe, which can then be merged into my original dataframe thus filling the gaps. Code is shown below. It seems quite a round about way of doing this. I would like to know if there is a better way. Maybe with resampling the data?

import pandas as pd
from datetime import datetime

# Initialise prices dataframe with missing data
prices = pd.DataFrame([[datetime(2019,2,7,16,0),  124.634,  124.624, 124.65,   124.62],[datetime(2019,2,7,16,4), 124.624,  124.627,  124.647,  124.617]])
prices.columns = ['datetime','open','high','low','close']
prices = prices.set_index('datetime')
print(prices)

# Create a new dataframe with complete set of time intervals
idx_ref = pd.DatetimeIndex(start=datetime(2019,2,7,16,0), end=datetime(2019,2,7,16,4),freq='min')
df = pd.DataFrame(index=idx_ref)

# Merge the two dataframes 
prices = pd.merge(df, prices, how='outer', left_index=True, 
right_index=True)
print(prices)

515

asked Feb 08 '19 12:02

Arran Duff

2 Answers

Use DataFrame.asfreq working with Datetimeindex:

prices = prices.set_index('datetime').asfreq('1Min')
print(prices)
                        open     high      low    close
datetime                                               
2019-02-07 16:00:00  124.634  124.624  124.650  124.620
2019-02-07 16:01:00      NaN      NaN      NaN      NaN
2019-02-07 16:02:00      NaN      NaN      NaN      NaN
2019-02-07 16:03:00      NaN      NaN      NaN      NaN
2019-02-07 16:04:00  124.624  124.627  124.647  124.617

answered Sep 29 '22 09:09

jezrael

A more manual answer would be:

from datetime import datetime, timedelta
from dateutil import parser

import pandas as pd



df = pd.DataFrame({
 'a': ['2021-02-07 11:00:30', '2021-02-07 11:00:31', '2021-02-07 11:00:35'],
 'b': [64.8, 64.8, 50.3]
})

max_dt = parser.parse(max(df['a']))
min_dt = parser.parse(min(df['a']))


dt_range = []
while min_dt <= max_dt:
  dt_range.append(min_dt.strftime("%Y-%m-%d %H:%M:%S"))
  min_dt += timedelta(seconds=1)


complete_df = pd.DataFrame({'a': dt_range})
final_df = complete_df.merge(df, how='left', on='a')

It converts the following dataframe:

                     a     b
0  2021-02-07 11:00:30  64.8
1  2021-02-07 11:00:31  64.8
2  2021-02-07 11:00:35  50.3

to:

                     a     b
0  2021-02-07 11:00:30  64.8
1  2021-02-07 11:00:31  64.8
2  2021-02-07 11:00:32   NaN
3  2021-02-07 11:00:33   NaN
4  2021-02-07 11:00:34   NaN
5  2021-02-07 11:00:35  50.3

which we can fill its null values later

answered Sep 29 '22 07:09

NaWeeD

Related questions
                            
                                Finding maximum weighted edge in a networkx graph in python
                            
                                Why can I repeat the + in Python arbitrarily in a calculation?
                            
                                Numpy Random Choice not working for 2-dimentional list
                            
                                Correct way to use GeoPy Nominatim
                            
                                How to implement "positional-only parameter" in a user defined function in python?
                            
                                Create pandas dataframe from string (in csv format)
                            
                                Perspective transform with Python PIL using src / target coordinates
                            
                                Replace string in PySpark
                            
                                Align text in the putText() in OpenCV
                            
                                How can I fix "Error tokenizing data" on pandas csv reader?
                            
                                How to change languages(translations) dynamically on PyQt5?
                            
                                Find most common string in a 2D list
                            
                                Python TypeError : only integer scalar arrays can be converted to a scalar index
                            
                                Python ValueError: unconverted data remains:
                            
                                "TypeError: Singleton array cannot be considered a valid collection" using sklearn train_test_split
                            
                                TypeError: _transform() takes 2 positional arguments but 3 were given
                            
                                Array: Insert with negative index [duplicate]
                            
                                Transform a 3-column dataframe into a matrix
                            
                                how to fix - error: bad escape \u at position 0
                            
                                Unable to verify secret hash for client at REFRESH_TOKEN_AUTH

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Find gaps in pandas time series dataframe sampled at 1 minute intervals and fill the gaps with new rows

Tags:

python

python-3.x

pandas