Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

error using statsmodels OLS: returning nan values [closed]

I have a data set like:

Growth  NHSPSTY% Index  USURTOT Index   GLPFTOCI Index  CPTICHNG Index  NAPMPMI Index   RSTAXYOY Index  SAARTOTL Index  USASHVTK Index  CONCCONF Index  LEI TOTL Index  SPX Index   TOT_DEBT_TO_TOT_EQY BDIY Index  cry index   CO1 Comdty
Date                                                                
1998-03-31  4.1 7.5 4.7 0.121000    83.5325 52.9    2.9 -0.032258   0.404   133.80  88.9    0.455185    197.26  966 169.04  14.26
1998-06-30  3.8 9.8 4.5 0.125556    82.2970 48.9    4.5 0.154930    0.393   138.23  88.6    0.280973    204.65  856 152.58  13.38

I wanted to run a OLS regression but all the parameters returned all nan values. And it warned:

/Users/jake/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/Users/jake/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/Users/jake/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1821: RuntimeWarning: invalid value encountered in less_equal
  cond2 = cond0 & (x <= self.a)


coef    std err t   P>|t|   [0.025  0.975]
const   nan nan nan nan nan nan
NHSPSTY% Index  nan nan nan nan nan nan
USURTOT Index   nan nan nan nan nan nan
GLPFTOCI Index  nan nan nan nan nan nan
CPTICHNG Index  nan nan nan nan nan nan

My command:

import statsmodels.api as sm
model = sm.OLS(data.Growth,sm.add_constant(data.iloc[:,1:])).fit()
model.summary()
like image 572
JAKE Avatar asked Jun 29 '26 03:06

JAKE


1 Answers

Without further information, such as the data, it is not possible to give an accurate answer. The best we can do is make informed guesses, so here I am going to list all the reasons I can think of for nan values in the output from statsmodels, along with some simple code to check for some of the:

1. Missing Data (NaNs) in Input

If the dependent variable (Growth) or any predictors contain missing values, the model will propagate NaNs. OLS requires complete data; rows with NaNs are silently dropped, potentially leaving insufficient data for estimation (Allison, 2001).

2. Perfect or Near-Perfect Multicollinearity

Predictors that are linear combinations of others (eg., X_1 = 2 x X_2) render the design matrix X^TX singular, preventing coefficient estimation (Belsley et al., 1980). High correlation (eg., >0.99) between predictors can also destabilise estimates.

3. Constant or Zero-Variance Predictors

Columns with no variation (eg., all zeros) are redundant when an intercept is included. This creates rank deficiency, leading to NaN coefficients.

4. Mismatched Dimensions Between X and y

If the number of rows in X and y differ due to misalignment or implicit dropping of NaNs, the regression will fail.

5. Non-Numeric Data Types

String or object-type columns in X may be silently coerced to NaN.

6. Too Few Observations Relative to Predictors (p >= n)

When predictors (including the intercept) outnumber observations, the system is underdetermined, yielding no unique solution.

7. Numeric Instability in Matrix Inversion

Ill-conditioned matrices (high condition number) can produce NaN due to floating-point errors.

References

Allison, P. D. (2001). Missing data. Sage.

Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. Wiley.

Appendix - Diagnostic Code

The following is some basic diagnostic code

```python
# 1. Summary of missing values
print(data.isnull().sum())

# 2. Drop rows with any NaNs
data_clean = data.dropna()

# 3. Check for constant or all-zero columns
print((data_clean.iloc[:, 1:].nunique() <= 1))

# 4. Check for object types
print(data_clean.dtypes)

# 5. Confirm dimensions
print("Observations (n):", data_clean.shape[0])
print("Predictors (p):", data_clean.shape[1] - 1)

# 6. Condition number (multicollinearity check)
import numpy as np
import statsmodels.api as sm
X = sm.add_constant(data_clean.drop(columns='Growth'))
print("Condition number:", np.linalg.cond(X))

# 7. Final model (cleaned)
y = data_clean['Growth']
model = sm.OLS(y, X).fit()
print(model.summary())
like image 119
Robert Long Avatar answered Jul 01 '26 17:07

Robert Long



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!