I have a dataframe object which contains 1 seconds intervals of the EUR_USD currency pair. But in theory it could be any interval and in this case it could look like this:
2015-11-10 01:00:00+01:00 1.07616
2015-11-10 01:01:00+01:00 1.07605
2015-11-10 01:02:00+01:00 1.07590
2015-11-10 01:03:00+01:00 1.07592
2015-11-10 01:04:00+01:00 1.07583
I'd like to use linear regression to draw a trend line from the data in dataframe, but I'm not sure what the best way are to do that with time series, and even such a small interval of time series.
So far I've messed around by replacing the time by (and this is just to show where I'd like to go with it) a list ranging from 0 to the time series list length.
x = list(range(0, len(df.index.tolist()), 1))
y = df["closeAsk"].tolist()
Using numpy to do the math magic
fit = np.polyfit(x,y,1)
fit_fn = np.poly1d(fit)
Lastly I draw the function along with the df["closeAsk"] to make sense of the trend.
plt.plot(x,df["closeAsk"], '-')
plt.plot(x,y, 'yo', x, fit_fn(x), '--k')
plt.show()
However now the x-axis is just meaningless numbers, instead I'd like for them to show the time series.
To elaborate on my comment:
Say you have some evenly spaced time series data, time
, and some correlated data, data
, as you've laid out in your question.
time = pd.date_range('9:00', '10:00', freq='1s')
data = np.cumsum(np.random.randn(time.size))
df = pd.DataFrame({'time' : time,
'data' : data})
As you've shown, you can do a linear fit of the data with np.polyfit
and create the trend line with np.poly1d
.
x = np.arange(time.size) # = array([0, 1, 2, ..., 3598, 3599, 3600])
fit = np.polyfit(x, df['data'], 1)
fit_fn = np.poly1d(fit)
Then plot the data and the fit with df['time']
as the x-axis.
plt.plot(df['time'], fit_fn(x), 'k-')
plt.plot(df['time'], df['data'], 'go', ms=2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With