I am trying to run grangercausalitytests
on two time series:
import numpy as np import pandas as pd from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, n) df1 = pd.DataFrame(np.sin(ls)) df2 = pd.DataFrame(2*np.sin(1+ls)) df = pd.concat([df1, df2], axis=1) df.plot() grangercausalitytests(df, maxlag=20)
However, I am getting
Granger Causality number of lags (no zero) 1 ssr based F test: F=272078066917221398041264652288.0000, p=0.0000 , df_denom=996, df_num=1 ssr based chi2 test: chi2=272897579166972095424217743360.0000, p=0.0000 , df=1 likelihood ratio test: chi2=60811.2671, p=0.0000 , df=1 parameter F test: F=272078066917220553616334520320.0000, p=0.0000 , df_denom=996, df_num=1 Granger Causality number of lags (no zero) 2 ssr based F test: F=7296.6976, p=0.0000 , df_denom=995, df_num=2 ssr based chi2 test: chi2=14637.3954, p=0.0000 , df=2 likelihood ratio test: chi2=2746.0362, p=0.0000 , df=2 parameter F test: F=13296850090491009488285469769728.0000, p=0.0000 , df_denom=995, df_num=2 ... /usr/local/lib/python3.5/dist-packages/numpy/linalg/linalg.py in _raise_linalgerror_singular(err, flag) 88 89 def _raise_linalgerror_singular(err, flag): ---> 90 raise LinAlgError("Singular matrix") 91 92 def _raise_linalgerror_nonposdef(err, flag): LinAlgError: Singular matrix
and I am not sure why this is the case.
How to Fix the Error. The only way to get around this error is to simply create a matrix that is not singular. What is this? We don't receive any error when inverting the matrix because the matrix is not singular.
A singular matrix error occurs when the circuit does not have a unique and finite solution. For example, a circuit containing a floating capacitor does not have a unique DC solution as the capacitor can be at any voltage.
LinAlgError[source] Generic Python-exception-derived object raised by linalg functions. General purpose exception class, derived from Python's exception. Exception class, programmatically raised in linalg functions when a Linear Algebra-related condition would prevent further correct execution of the function.
The problem arises due to the perfect correlation between the two series in your data. From the traceback, you can see, that internally a wald test is used to compute the maximum likelihood estimates for the parameters of the lag-time series. To do this an estimate of the parameters covariance matrix (which is then near-zero) and its inverse is needed (as you can also see in the line invcov = np.linalg.inv(cov_p)
in the traceback). This near-zero matrix is now singular for some maximum lag number (>=5) and thus the test crashes. If you add just a little noise to your data, the error disappears:
import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.tsa.stattools import grangercausalitytests n = 1000 ls = np.linspace(0, 2*np.pi, n) df1Clean = pd.DataFrame(np.sin(ls)) df2Clean = pd.DataFrame(2*np.sin(ls+1)) dfClean = pd.concat([df1Clean, df2Clean], axis=1) dfDirty = dfClean+0.00001*np.random.rand(n, 2) grangercausalitytests(dfClean, maxlag=20, verbose=False) # Raises LinAlgError grangercausalitytests(dfDirty, maxlag=20, verbose=False) # Runs fine
Another thing to keep an eye out for is duplicate columns. Duplicate columns will have a correlation score of 1.0, resulting in singularity. Otherwise, it's also possible you have 2 features that are perfectly correlated. And easy way to check this is with df.corr()
, and look for pairs of columns with correlation = 1.0.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With