I want to ask a quick question related to regression analysis in python pandas. So, assume that I have the following datasets:
Group Y X
1 10 6
1 5 4
1 3 1
2 4 6
2 2 4
2 3 9
My aim is to run regression; Y is dependent and X is independent variable. The issue is I want to run this regression by Group and print the coefficients in a new data set. So, the results should be like:
Group Coefficient
1 0.25 (lets assume that coefficient is 0.25)
2 0.30
I hope I can explain my question. Many thanks in advance for your help.
I am not sure about the type of regression you need, but this is how you do an OLS (Ordinary least squares):
import pandas as pd
import statsmodels.api as sm
def regress(data, yvar, xvars):
Y = data[yvar]
X = data[xvars]
X['intercept'] = 1.
result = sm.OLS(Y, X).fit()
return result.params
#This is what you need
df.groupby('Group').apply(regress, 'Y', ['X'])
You can define your regression function and pass parameters to it as mentioned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With