Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an R-svydesign equivalent in Python to apply complex survey design weights?

Is there a way to incorporate survey weights from complex survey designs to conduct descriptive statistics (e.g., frequencies and crosstabs) and linear/logistical regressions and modeling?

Analyzing survey datasets with complex survey designs typically involve stratification and clustering to represent larger populations (e.g., CDC National Health Interview Survey. p 41).

Couple of posts Design-corrected Variance Estimation in Python and How to conduct SAS Proc SurveyFreq with strata in Pythonmention the samplics, Quantipy and PandaSurvey packages.

Unfortunately, none of those packages really enable survey data analysis and documentation is quite limited.

In R, I know this is doable with survey package, applying the surveydesign to the dataframe by:

# Readin data
df = read_dta('data.dta')

# Define survey design
design <- svydesign(id=~ID, weights=~analwt ,strata=~Final_strata, data=df)

# Apply survey weights
df_weighted <- svydesign(id=~ID, weights=~analwt, strata=~Final_strata, nest=TRUE, survey.lonely.psu="adjust",data=df)

# Crosstab
xtab = svytable(~var1+var2, df_weighted) %>%
  prop.table(1)*100

And in SAS an example can be:

proc surveyfreq data=PS_Data;
tables INT4a; strata
final_strata; cluster granteeid;
weight analwt;
run;

In STATA, this is done by:

Svyset GranteeID [pweight=ANALWT], strata(Final_strata) vce(linearized)
Then use the svy prefix for analysis commands.

I've tried the crosstab function in pandas but it doesn't seem to have the ability to account for the strat/clustering.

xtab_weighted = pd.crosstab(
    df['var1'], df['var2'], 
    df.analwt,
    aggfunc=sum, dropna=True, normalize=True)

like image 384
Datadrivendog Avatar asked Sep 01 '25 01:09

Datadrivendog


1 Answers

Python's samplics library looks like a promising option. It includes functions for both summary statistics and a few statistical tests, including chi squared tests and t tests; the regression code is still a work in progress. It doesn't appear to be as extensive as R's survey package, but you ought to look into it if you wish to stay within the Python ecosystem when analyzing weighted survey data.

like image 151
KBurchfiel Avatar answered Sep 02 '25 14:09

KBurchfiel