Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using StatsModels to plot quantile regression for 2nd order polynomial

I am following the StatsModels example here to plot quantile regression lines. With only slight modification for my data, the example works great, producing this plot (note that I have modified the code to only plot the 0.05, 0.25, 0.5, 0.75, and 0.95 quantiles) : enter image description here

However, I would like to plot the OLS fit and corresponding quantiles for a 2nd order polynomial fit (instead of linear). For example, here is the 2nd-order OLS line for the same data: enter image description here

How can I modify the code in the linked example to produce non-linear quantiles?

Here is my relevant code modified from the linked example to produce the 1st plot:

d = {'temp': x, 'dens': y}
df = pd.DataFrame(data=d)

# Least Absolute Deviation
# 
# The LAD model is a special case of quantile regression where q=0.5

mod = smf.quantreg('dens ~ temp', df)
res = mod.fit(q=.5)
print(res.summary())

# Prepare data for plotting
# 
# For convenience, we place the quantile regression results in a Pandas DataFrame, and the OLS results in a dictionary.

quantiles = [.05, .25, .50, .75, .95]
def fit_model(q):
    res = mod.fit(q=q)
    return [q, res.params['Intercept'], res.params['temp']] + res.conf_int().ix['temp'].tolist()

models = [fit_model(x) for x in quantiles]
models = pd.DataFrame(models, columns=['q', 'a', 'b','lb','ub'])

ols = smf.ols('dens ~ temp', df).fit()
ols_ci = ols.conf_int().ix['temp'].tolist()
ols = dict(a = ols.params['Intercept'],
           b = ols.params['temp'],
           lb = ols_ci[0],
           ub = ols_ci[1])

print(models)
print(ols)

x = np.arange(df.temp.min(), df.temp.max(), 50)
get_y = lambda a, b: a + b * x

for i in range(models.shape[0]):
    y = get_y(models.a[i], models.b[i])
    plt.plot(x, y, linestyle='dotted', color='grey')

y = get_y(ols['a'], ols['b'])
plt.plot(x, y, color='red', label='OLS')

plt.scatter(df.temp, df.dens, alpha=.2)
plt.xlim((-10, 40))
plt.ylim((0, 0.4))
plt.legend()
plt.xlabel('temp')
plt.ylabel('dens')
plt.show()
like image 743
PJW Avatar asked Feb 03 '16 18:02

PJW


People also ask

What does quantile regression do?

Quantile regression allows the analyst to drop the assumption that variables operate the same at the upper tails of the distribution as at the mean and to identify the factors that are important determinants of expenditures and quality of care for different subgroups of patients.

Is statsmodel good for simple linear regression?

Statsmodel package is rich with descriptive statistics and provides number of models. Although simple linear line won’t fit our x data still let’s see how it performs. where b 0 is bias and b 1 is weight for simple Linear Regression equation. Statsmodel provides OLS model (ordinary Least Sqaures) for simple linear regression.

How do you fit a 2nd order polynomial to data?

Fit a Second Order Polynomial to the given data. Curve fitting Polynomial Regression Let Y = a1 + a2x + a3x2 ( 2 nd order polynomial ). Here, m = 3 ( because to fit a curve we need at least 3 points ). Since the order of the polynomial is 2, therefore we will have 3 simultaneous equations as below.

What is statsmodel package?

Statsmodel package is rich with descriptive statistics and provides number of models. Although simple linear line won’t fit our x data still let’s see how it performs. where b 0 is bias and b 1 is weight for simple Linear Regression equation.

What is the difference between sklearn and statsmodel?

Particularly, sklearn doesnt provide statistical inference of model parameters such as ‘standard errors’. Statsmodel package is rich with descriptive statistics and provides number of models. Although simple linear line won’t fit our x data still let’s see how it performs. where b 0 is bias and b 1 is weight for simple Linear Regression equation.


1 Answers

After a day of looking into this, came up with a solution, so posting my own answer. Much credit to Josef Perktold at StatsModels for assistance.

Here is the relevant code and plot:

d = {'temp': x, 'dens': y}
df = pd.DataFrame(data=d)

x1 = pd.DataFrame({'temp': np.linspace(df.temp.min(), df.temp.max(), 200)})

poly_2 = smf.ols(formula='dens ~ 1 + temp + I(temp ** 2.0)', data=df).fit()
plt.plot(x, y, 'o', alpha=0.2)
plt.plot(x1.temp, poly_2.predict(x1), 'r-', 
         label='2nd order poly fit, $R^2$=%.2f' % poly_2.rsquared, 
         alpha=0.9)
plt.xlim((-10, 50))
plt.ylim((0, 0.25))
plt.xlabel('mean air temp')
plt.ylabel('density')
plt.legend(loc="upper left")


# with quantile regression

# Least Absolute Deviation
# The LAD model is a special case of quantile regression where q=0.5

mod = smf.quantreg('dens ~ temp + I(temp ** 2.0)', df)
res = mod.fit(q=.5)
print(res.summary())

# Quantile regression for 5 quantiles

quantiles = [.05, .25, .50, .75, .95]

# get all result instances in a list
res_all = [mod.fit(q=q) for q in quantiles]

res_ols = smf.ols('dens ~ temp + I(temp ** 2.0)', df).fit()


plt.figure()

# create x for prediction
x_p = np.linspace(df.temp.min(), df.temp.max(), 50)
df_p = pd.DataFrame({'temp': x_p})

for qm, res in zip(quantiles, res_all):
    # get prediction for the model and plot
    # here we use a dict which works the same way as the df in ols
    plt.plot(x_p, res.predict({'temp': x_p}), linestyle='--', lw=1, 
             color='k', label='q=%.2F' % qm, zorder=2)

y_ols_predicted = res_ols.predict(df_p)
plt.plot(x_p, y_ols_predicted, color='red', zorder=1)
#plt.scatter(df.temp, df.dens, alpha=.2)
plt.plot(df.temp, df.dens, 'o', alpha=.2, zorder=0)
plt.xlim((-10, 50))
plt.ylim((0, 0.25))
#plt.legend(loc="upper center")
plt.xlabel('mean air temp')
plt.ylabel('density')
plt.title('')
plt.show()

enter image description here

red line: 2nd order polynomial fit

black dashed lines: 5th, 25th, 50th, 75th, 95th percentiles

like image 126
PJW Avatar answered Sep 29 '22 14:09

PJW