How to use sample weights in GAM (mgcv) on survey data for Logit regression?

Tags:

I'm interesting in performing a GAM regression on data from a national wide survey which presents sample weights. I read with interest this post. I selected my vars of interest generating a DF:

nhanesAnalysis <- nhanesDemo %>%
                    select(fpl,
                           age,
                           gender,
                           persWeight,
                           psu,
                           strata)

Than, for what I understood, I generated a weighted DF with the following code:

library(survey)    
nhanesDesign <- svydesign(    id      = ~psu,
                              strata  = ~strata,
                              weights = ~persWeight,
                              nest    = TRUE,
                              data    = nhanesAnalysis)

Let's say that I would select only subjects with age≥30:

ageDesign <- subset(nhanesDesign, age >= 30)

Now, I would fit a GAM model (fpl ~ s(age) + gender) with mgcv package. Is it possible to do so with the weights argument or using svydesign object ageDesign ?

EDIT

I was wondering if is it correct to extrapolate computed weights from the an svyglm object and use it for weights argument in GAM.

247

asked May 26 '19 13:05

Borexino

1 Answers

This is more difficult than it looks. There are two issues

You want to get the right amount of smoothing
You want valid standard errors.

Just giving the sampling weights to mgcv::gam() won't do either of these: gam() treats the weights as frequency weights and so will think it has a lot more data than it actually has. You will get undersmoothing and underestimated standard errors because of the weights, and you will also likely get underestimated standard errors because of the cluster sampling.

The simple work-around is to use regression splines (splines package) instead. These aren't quite as good as the penalised splines used by mgcv, but the difference usually isn't a big deal, and they work straightforwardly with svyglm. You do need to choose how many degrees of freedom to assign.

library(splines)
svglm(fpl ~ ns(age,4) + gender, design = nhanesDesign)

198

answered Sep 21 '22 19:09

Thomas Lumley

Related questions
                            
                                White space from datatable screenshot in Rmarkdown PDF
                            
                                Interactive plots on local .html via .rmd or Shiny
                            
                                Implement R package TSdist from python
                            
                                R code inside math notation R Markdown
                            
                                R: trouble with mle() error: non-finite finite-difference value [2]
                            
                                R and Rscript give different results for datetime
                            
                                Formatting multiple columns with flextable r package
                            
                                How to perform piece wise/spline regression for longitudinal temperature series in R (New Update)?
                            
                                Incorrect columnname displayed in dataTableOutput, when selectinput(multiple=T) - shiny
                            
                                Is there R command(s) making Keras Tensorflow-GPU to run on CPU?
                            
                                How to draw directional spider network in geom_segment/ggplot2 in R?
                            
                                Install and use RPy2 (using conda) so that it uses default R installation in /usr/lib/R R
                            
                                Using objects inside list as function arguments in lapply
                            
                                Shiny modularized inputs inside pop-up modal aren't being written to reactiveValues when dismissed [flexdashboard/shinydashboard]
                            
                                Extract text and links from unbalanced html table
                            
                                Rstudio does not stop at breakpoint
                            
                                subcomponent(mode = "in") for multiple source vertices
                            
                                Getting connection timed out error while GeoCoding in R
                            
                                Train time series models in caret by group
                            
                                Stata syntax highlighting in Rmarkdown

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use sample weights in GAM (mgcv) on survey data for Logit regression?

Tags:

r

sample

survey

mgcv

gam

Borexino

People also ask

1 Answers

Thomas Lumley

Recent Activity

Donate For Us