Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to implement sample weights?

I'm using statsmodels for logistic regression analysis in Python. For example:

import statsmodels.api as sm
import numpy as np
x = arange(0,1,0.01)
y = np.random.rand(100)
y[y<=x] = 1
y[y!=1] = 0
x = sm.add_constant(x)
lr = sm.Logit(y,x)
result = lr.fit().summary()

But I want to define different weightings for my observations. I'm combining 4 datasets of different sizes, and want to weight the analysis such that the observations from the largest dataset do not dominate the model.

like image 587
user2448817 Avatar asked Apr 28 '14 11:04

user2448817


1 Answers

Took me a while to work this out, but it is actually quite easy to create a logit model in statsmodels with weighted rows / multiple observations per row. Here's how's it's done:

import statsmodels.api as sm
logmodel=sm.GLM(trainingdata[['Successes', 'Failures']], trainingdata[['const', 'A', 'B', 'C', 'D']], family=sm.families.Binomial(sm.families.links.logit)).fit()
like image 96
user3805082 Avatar answered Oct 13 '22 10:10

user3805082