I have a problem that I am trying to resolve with no success. More than two days searching and I didn’t get a single clue. Sorry if the answer is out there and I didn’t find it. Suppose that you have a logistic equation regression (binary model) from an old model that you estimated some years ago. Therefore you know the parameters βk (k = 1, 2, ..., p) because they were estimated in the past. But you don’t have the data that were used to fit the model. My question is: can I introduce this old estimated logistic model in R as an object (corresponding to a logistic regression model)? I would like to use the “predict” function to prove this logistic regression with a new set of data (present data) and then check the validity of this old model standing the test of time. And to use this function you need the object of the logistic regression model. Thank you very much in advance.

Per my comment, I think you could start by just calculating predictions directly from the coefficients. Here's an example that compares the output from <code>predict.glm</code> to predicted probabilities calculated directly on the data: <pre class="prettyprint"><code># construct some data and model it # y ~ x1 + x2 set.seed(1) x1 <- runif(100) x2 <- runif(100) y <- rbinom(100,1,(x1+x2)/2) data1 <- data.frame(x1=x1,x2=x2,y=y) x3 <- runif(100) x4 <- runif(100) y2 <- rbinom(100,1,(x3+x4)/2) data2 <- data.frame(x1=x3,x2=x4,y=y2) glm1 <- glm(y~x1+x2,data=data1,family=binomial) # extract coefs #summary(glm1) coef1 <- coef(glm1) # calculate predicted probabilities for current data tmp1 <- coef1[1] + (data1$x1*coef1[2]) + (data1$x2*coef1[3]) pr1 <- 1/(1+(1/exp(tmp1))) # these match those from `predict`: all.equal(pr1,predict(glm1,data1,type='response')) # now apply to new data: tmp2 <- coef1[1] + (data2$x1*coef1[2]) + (data2$x2*coef1[3]) pr2 <- 1/(1+(1/exp(tmp2))) pr2 </code></pre> This is obviously not a general solution, nor does it properly handle uncertainty, but I think it's a better approach than hacking <code>predict</code>.

How can I use the predict function in R in a logistic regression fitted years ago?

Tags:

r

logistic-regression

predict

I have a problem that I am trying to resolve with no success. More than two days searching and I didn’t get a single clue. Sorry if the answer is out there and I didn’t find it.

Suppose that you have a logistic equation regression (binary model) from an old model that you estimated some years ago. Therefore you know the parameters βk (k = 1, 2, ..., p) because they were estimated in the past. But you don’t have the data that were used to fit the model.

My question is: can I introduce this old estimated logistic model in R as an object (corresponding to a logistic regression model)?

I would like to use the “predict” function to prove this logistic regression with a new set of data (present data) and then check the validity of this old model standing the test of time. And to use this function you need the object of the logistic regression model.

Thank you very much in advance.

750

asked Oct 09 '13 08:10

perevales

2 Answers

Per my comment, I think you could start by just calculating predictions directly from the coefficients. Here's an example that compares the output from predict.glm to predicted probabilities calculated directly on the data:

# construct some data and model it
# y ~ x1 + x2
set.seed(1)
x1 <- runif(100)
x2 <- runif(100)
y <- rbinom(100,1,(x1+x2)/2)
data1 <- data.frame(x1=x1,x2=x2,y=y)
x3 <- runif(100)
x4 <- runif(100)
y2 <- rbinom(100,1,(x3+x4)/2)
data2 <- data.frame(x1=x3,x2=x4,y=y2)
glm1 <- glm(y~x1+x2,data=data1,family=binomial)

# extract coefs
#summary(glm1)
coef1 <- coef(glm1)

# calculate predicted probabilities for current data
tmp1 <- coef1[1] + (data1$x1*coef1[2]) + (data1$x2*coef1[3])
pr1 <- 1/(1+(1/exp(tmp1)))
# these match those from `predict`:
all.equal(pr1,predict(glm1,data1,type='response'))

# now apply to new data:
tmp2 <- coef1[1] + (data2$x1*coef1[2]) + (data2$x2*coef1[3])
pr2 <- 1/(1+(1/exp(tmp2)))
pr2

This is obviously not a general solution, nor does it properly handle uncertainty, but I think it's a better approach than hacking predict.

143

answered Sep 29 '22 14:09

Thomas

You can create a glm fit with only an offset created from the coefficients that you have, then use the regular predict function with that. For example using the iris data (first fitting a model on the real data, then fitting a new model using dummy data and the coefficients from the first fit):

fit1 <- glm( I(Species=='versicolor') ~ Petal.Length + Petal.Width, 
   data=iris, family=binomial )
coef(fit1)

dummydata <- data.frame( Petal.Length = rnorm(10), Petal.Width=rnorm(10),
    Species = rep(c('versicolor','other'), each=5) )

fit2 <- glm( I(Species=='versicolor') ~ 0 + 
  offset(-2.863708 + 1.563076*Petal.Length - 3.153165*Petal.Width),
    data=dummydata, family=binomial )

pred1 <- predict(fit1, newdata=iris)
pred2 <- predict(fit2, newdata=iris)
plot(pred1,pred2)
abline(0,1, col='green')

answered Sep 29 '22 15:09

Greg Snow

Related questions
                            
                                installing the package xtsExtra on a windows machine
                            
                                simple reshape in R using reshape2 causes error
                            
                                How to plot absolute values and differences including confidence intervals
                            
                                How to calculate percentage for each cell in a dataframe using ddply?
                            
                                Getting the parse tree for a predefined function in R
                            
                                programmatically adding new variables to a dataframe
                            
                                apply over nested functions
                            
                                Majority vote in R
                            
                                When I use stat_summary with line and point geoms I get a double legend
                            
                                Create index for contiguous runs of values
                            
                                R: how to make passing of arguments using ... to multiple functions well-defined?
                            
                                How can I produce more than one file format per plot code in R?
                            
                                Why won't `cat` append to a `file` connection?
                            
                                Save storage space for small integers or factors with few levels
                            
                                remove IDs that occur x times R
                            
                                An error in one job contaminates others with mclapply
                            
                                Equivalent of 'range' in boxplot for ggplot2
                            
                                eval(parse(text=x)) inside a function, how to evaluate in global environment?
                            
                                Reference Class with custom field classes in R?
                            
                                Create indicator variables of holidays from a date column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With