Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: pmdarima, autoarima does not work with large data

I have a Dataframe with around 80.000 observations taken every 15 min. The seasonal parameter m is assumed with 96, because every 24h the pattern repeats. When I insert these informations in my auto_arima algorithm, it takes a long time (some hours) until the following error message is given out:

MemoryError: Unable to allocate 5.50 GiB for an array with shape (99, 99, 75361) and data type float64

The code that I am using:

stepwise_fit = auto_arima(df['Hges'], seasonal=True, m=96, stepwise=True, 
                          stationary=True, trace=True)
print(stepwise_fit.summary())

I tried it with resampling to hourly values, to reduce the amount of data and the m-factor to 24, but still my computer cannot calculate the result.

How do find the weighting factors with auto_arima when you deal with large data ?

like image 657
christianbauer1 Avatar asked Jan 25 '23 19:01

christianbauer1


1 Answers

I don't recall the exact source where I read this, but neither auto.arima nor pmdarima are really optimized to scale, which might explain the issues you are facing.

But there are some more important things to note about your question: With 80K data points at 15 minute intervals, ARIMA probably isn't the best type of model for your use case anyway:

  • With the frequency and density of your data, it is likely that there are multiple cycles/seasonal patterns, and ARIMA can handle only one seasonal component. So at the very least you should try a model that can handle multiple seasonalities like STS or Prophet (TBATS in R can also handle multiple seasonalities, but it is likely to suffer from the same issues as auto.arima, since it is in the same package).
  • At 80K points and 15 minute measurement intervals, I assume you are most likely dealing with a "physical" time series that is the output of a sensor or some other metering/monitoring device (electrical load, network traffic, etc...). These types of time series are usually very good use cases for LSTM or other Deep Learning based models instead of ARIMA.
like image 192
Alex Kinman Avatar answered Feb 04 '23 21:02

Alex Kinman