Regression for a Rate variable in R

Tags:

I was tasked with developing a regression model looking at student enrollment in different programs. This is a very nice, clean data set where the enrollment counts follow a Poisson distribution well. I fit a model in R (using both GLM and Zero Inflated Poisson.) The resulting residuals seemed reasonable.

However, I was then instructed to change the count of students to a "rate" which was calculated as students / school_population (Each school has its own population.)) This is now no longer a count variable, but a proportion between 0 and 1. This is considered the "proportion of enrollment" in a program.

This "rate" (students/population) is no longer Poisson, but is certainly not normal either. So, I'm a bit lost as to the appropriate distribution, and subsequent model to represent it.

A log normal distribution seems to fit this rate parameter well, however I have many 0 values, so it won't actually fit.

Any suggestions on the best form of distribution for this new parameter, and how to model it in R?

Thanks!

597

asked Apr 16 '13 20:04

Noah

1 Answers

As suggested in the comments you could keep the Poisson model and do it with an offset:

glm(response~predictor1+predictor2+predictor3+ ... + offset(log(population),
     family=poisson,data=...)

Or you could use a binomial GLM, either

glm(cbind(response,pop_size-response) ~ predictor1 + ... , family=binomial,
        data=...)

glm(response/pop_size ~ predictor1 + ... , family=binomial,
        weights=pop_size,
        data=...)

The latter form is sometimes more convenient, although less widely used. Be aware that in general switching from Poisson to binomial will change the link function from log to logit, although you can use family=binomial(link="log")) if you prefer.

Zero-inflation might be easier to model with the Poisson + offset combination (I'm not sure if the pscl package, the most common approach to ZIP, handles offsets, but I think it does), which will be more commonly available than a zero-inflated binomial model.

I think glmmADMB will do a zero-inflated binomial model, but I haven't tested it.

182

answered Oct 18 '22 11:10

Ben Bolker

Related questions
                            
                                How to use C api of xts package in Rcpp
                            
                                Loops with captions with knitr
                            
                                ggplot2 + Date structure using scale X
                            
                                Implementations of local regression and local likelihood methods
                            
                                How to read output from linux process status (ps) command in R?
                            
                                R data.table subsetting a subset
                            
                                What is the difference between cor and cor.test in R
                            
                                Vectorize for loop over data frame in R
                            
                                Converting a grouped continous variable into rows in R
                            
                                Is it possible call a COM object from within R, if the COM object is exposed from a .NET assembly?
                            
                                R documentation, how to set a character in bold font within math mode, within eqn or deqn?
                            
                                How do I get the current dimensions of the quartz device in R?
                            
                                Why does tempdir() adds extra slash at end of directory tree on osx?
                            
                                linear model when all occurrences of independent variables are NA
                            
                                Potential problems from over-allocating truelength more than 1000 times
                            
                                draw multiple discrete networks in R using igraph
                            
                                R intersect data.frame on multiple criteria
                            
                                Problems with testthat Connections
                            
                                Decimal places in Summary(model) output in R
                            
                                Sliding window in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Regression for a Rate variable in R

Tags:

r

regression

glm

poisson

Noah

People also ask

1 Answers

Ben Bolker

Recent Activity

Donate For Us