Linear Regression from Time Series Pandas

Tags:

I would like to get a regression with a time series as a predictor and I'm trying to follow the answer give on this SO answer (OLS with pandas: datetime index as predictor) but it no longer seems to work to the best of my knowledge.

Am I missing something or is there a new way to do this?

import pandas as pd

rng = pd.date_range('1/1/2011', periods=4, freq='H')       
s = pd.Series(range(4), index = rng)                                                                      
z = s.reset_index()

pd.ols(x=z["index"], y=z[0])

I'm getting this error. The error is explanatory but I'm wondering what I'm missing in reimplementing a solution that worked before.

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [float64]

503

asked May 24 '15 15:05

canyon289

1 Answers

I'm not sure why pd.ols is so picky there (it does appear to me that you followed the example correctly). I suspect this is due to changes in how pandas handles or stores datetime indexes but am too lazy to explore this further. Anyway, since your datetime variable differs only in the hour, you could just extract the hour with a dt accessor:

pd.ols(x=pd.to_datetime(z["index"]).dt.hour, y=z[0])

However, that gives you an r-squared of 1, since your model is overspecified with the inclusion of an intercept (and y being a linear function of x). You could change the range to np.random.randn and then you'd get something that looks like normal regression results.

In [6]: z = pd.Series(np.random.randn(4), index = rng).reset_index()                                                               
        pd.ols(x=pd.to_datetime(z["index"]).dt.hour, y=z[0])
Out[6]: 

-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations:         4
Number of Degrees of Freedom:   2

R-squared:         0.7743
Adj R-squared:     0.6615

Rmse:              0.5156

F-stat (1, 2):     6.8626, p-value:     0.1200

Degrees of Freedom: model 1, resid 2

-----------------------Summary of Estimated Coefficients------------------------
      Variable       Coef    Std Err     t-stat    p-value    CI 2.5%   CI 97.5%
--------------------------------------------------------------------------------
             x    -0.6040     0.2306      -2.62     0.1200    -1.0560    -0.1521
     intercept     0.2915     0.4314       0.68     0.5689    -0.5540     1.1370
---------------------------------End of Summary---------------------------------

Alternatively, you could convert the index to an integer, although I found this didn't work very well (I'm assuming because the integers represent nanoseconds since the epoch or something like that, and hence are very large and cause precision issues), but converting to integer and dividing by a trillion or so did work and gave essentially the same results as using dt.hour (i.e. same r-squared):

pd.ols(x=pd.to_datetime(z["index"]).astype(int)/1e12, y=z[0])

Source of the error message

FWIW, it looks like that error message is coming from something like this:

pd.to_datetime(z["index"]).astype(float)

Although a fairly obvious workaround is this:

pd.to_datetime(z["index"]).astype(int).astype(float)

105

answered Oct 10 '22 05:10

JohnE

Related questions
                            
                                Flask wtf form AttributeError: 'Request' object has no attribute 'POST'
                            
                                Include mouse cursor in screenshot
                            
                                How to return "already exists" error in Flask-restless?
                            
                                Break a long assignment into two lines in Python [duplicate]
                            
                                Vim plugin for automatically generating Python import statements (without using Rope)
                            
                                How to specify boundary behavior for SciPy's interp1d
                            
                                Can Python's asyncio.coroutine be thought of as a generator?
                            
                                "scoring must return a number" cross_val_score error in scikit-learn
                            
                                Modified BPMF in PyMC3 using `LKJCorr` priors: PositiveDefiniteError using `NUTS`
                            
                                How do I document the Jupyter Notebook Profile startup?
                            
                                How do I change the serializer that my multiprocessing.mangers.BaseManager subclass uses to cPickle?
                            
                                GenericRelatedObjectManager not JSON serializable
                            
                                When to use train_test_split of scikit learn
                            
                                Only ignore stop words for ngram_range=1
                            
                                Is Python's file.write atomic?
                            
                                Flask debug mode when using sockets
                            
                                Only one process prints in unix, multiprocessing python
                            
                                install ipython for current python version 2.x
                            
                                Connect to FTP TLS 1.2 Server with ftplib
                            
                                Python: Sum the Values of Three Layer Dictionaries

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Linear Regression from Time Series Pandas

Tags:

python

pandas

canyon289

People also ask

1 Answers

JohnE

Recent Activity

Donate For Us