Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Moving Averages for Time Series Data with Inconsistent Data Points

I have found plenty of information related to moving averages when the data is sampled to regular intervals (ie 1 min, 5 mins, etc). However, I need a solution for a time series dataset that has irregular time intervals.

The dataset contains two columns, Timestamp and Price. Timestamp goes down to the millisecond, and there is no set interval for rows. I need to take my dataframe and add three moving average columns:

1 min 5 min 10 min

I do not want to resample the data, I want the end result to be the same number of rows but with the three columns filled as applicable. (IE, NaN until the 1/5/10 min interval for each column, respectively)

I feel like I am getting close, but cannot figure out how to pass the moving average variable to this function:

import pandas as pd
import numpy as np

# Load IBM data from CSV
df = pd.read_csv(
    "C:/Documents/Python Scripts/MA.csv", names=['Timestamp',
                                                            'Price'])
# Create three moving average signals
df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
df.set_index('Timestamp', inplace=True)


def movingaverage(values, window):
    weights = np.repeat(1.0, window)/window
    smas = np.convolve(values, weights, 'valid')
    return smas


MA_1M = movingaverage(df, 1)
MA_5M = movingaverage(df, 5)
MA_10M = movingaverage(df, 10)

print(MA_1M)

Example Data:

Timestamp               Price
2018-10-08 04:00:00.013 152.59
2018-10-08 04:00:00.223 156.34
2018-10-08 04:01:00.000 152.73
2018-10-08 04:05:00.127 156.34
2018-10-08 04:10:00.000 152.73

Expected Output:

Timestamp               Price  MA_1M  MA_5M  MA10M
2018-10-08 04:00:00.013 152.59 N/A    N/A    N/A  
2018-10-08 04:00:00.223 156.34 N/A    N/A    N/A 
2018-10-08 04:01:00.000 154.73 154.55 N/A    N/A 
2018-10-08 04:05:00.127 155.34 155.34 155.47 N/A 
2018-10-08 04:10:00.000 153.73 153.73 154.54 154.55

At each row, the MA column takes that timestamp and looks back 1, 5 or 10 minutes and calculates the average. The thing that makes this difficult is that the rows can be generated at any millisecond. In my code above I am simply trying to get a moving average to work with a time variable. I am assuming that as long as the row counts match I can use the logic to add a column to my df.

like image 435
Fungui Avatar asked Jun 14 '26 16:06

Fungui


1 Answers

The following works, except for the NaNs - I don't know how attached you are to those:

foo  = df.apply(lambda x: df[(df['Timestamp'] <= x['Timestamp']) & (df['timestamp']> x['timestamp'] - pd.Timedelta('5 min'))]['Price'].mean(), axis=1)
like image 166
Igor Rivin Avatar answered Jun 17 '26 13:06

Igor Rivin



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!