According to this question How to get constant term in AR Model with statsmodels and Python?. I'm now trying to use the ARMA model to fit the data but again I couldn't find a way to interpret the model's result. Here what I have done according to ARMA out-of-sample prediction with statsmodels and ARMAResults.predict API document.
# Parameter
INPUT_DATA_POINT = 200
P = 5
Q = 0
# Read Data
data = []
f = open('stock_all.csv', 'r')
for line in f:
data.append(float(line.split(',')[5]))
f.close()
# Fit ARMA-model using the first piece of data
result = arma_model(data[:INPUT_DATA_POINT], P, Q)
# Predict using model (fit dimension is len(data) + 1 why?)
fit = result.predict(0, len(data))
# Plot
plt.figure(facecolor='white')
plt.title('ARMA Model Fitted Using ' + str(INPUT_DATA_POINT) + ' Data Points, P=' + str(P) + ' Q=' + str(Q) + '\n')
plt.plot(data, 'b-', label='data')
plt.plot(range(INPUT_DATA_POINT), result.fittedvalues, 'g--', label='fit')
plt.plot(range(len(data)), fit[:len(data)], 'r-', label='predict')
plt.legend(loc=4)
plt.show()
Here the result which is very strange because it should be nearly identical to the result from my last question as I mention in the link above. Also I'm not quite understand why there is some results for a couple of first data points since that shouldn't be valid (no previous value to compute).
I try to write my own prediction code which is shown below (omitted the top part that is identical to the above code)
# Predict using model
start_pos = max(result.k_ar, result.k_ma)
fit = []
for t in range(start_pos, len(data)):
value = 0
for i in range(1, result.k_ar + 1):
value += result.arparams[i - 1] * data[t - i]
for i in range(1, result.k_ma + 1):
value += result.maparams[i - 1] * data[t - i]
fit.append(value)
# Plot
plt.figure(facecolor='white')
plt.title('ARMA Model Fitted Using ' + str(INPUT_DATA_POINT) + ' Data Points, P=' + str(P) + ' Q=' + str(Q) + '\n')
plt.plot(data, 'b-', label='data')
plt.plot(range(INPUT_DATA_POINT), result.fittedvalues, 'r+', label='fit')
plt.plot(range(start_pos, len(data)), fit, 'r-', label='predict')
plt.legend(loc=4)
plt.show()
This is the best result I got
You trained the model on a subset of the data and then predict out of sample. AR(MA) prediction quickly converges to the mean of the data. That is why you see the first results. In your second results, you're not doing out of sample forecasting, you're just getting out-of-sample fitted values.
The first few observation data points are fit using the Kalman filter recursions (this is the distinction between full maximum likelihood estimates and conditional maximum likelihood estimates).
I would pick up a good forecasting textbook and review it to understand this behavior.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With