I have a problem that I am trying to resolve with no success. More than two days searching and I didn’t get a single clue. Sorry if the answer is out there and I didn’t find it.
Suppose that you have a logistic equation regression (binary model) from an old model that you estimated some years ago. Therefore you know the parameters βk (k = 1, 2, ..., p) because they were estimated in the past. But you don’t have the data that were used to fit the model.
My question is: can I introduce this old estimated logistic model in R as an object (corresponding to a logistic regression model)?
I would like to use the “predict” function to prove this logistic regression with a new set of data (present data) and then check the validity of this old model standing the test of time. And to use this function you need the object of the logistic regression model.
Thank you very much in advance.
The predict() function can be used to predict the probability that the market will go up, given values of the predictors. The type="response" option tells R to output probabilities of the form P(Y = 1|X) , as opposed to other information such as the logit .
Logistic function The standard logistic regression function, for predicting the outcome of an observation given a predictor variable (x), is an s-shaped curve defined as p = exp(y) / [1 + exp(y)] (James et al.
The predict() function in R is used to predict the values based on the input data. All the modeling aspects in the R program will make use of the predict() function in their own way, but note that the functionality of the predict() function remains the same irrespective of the case.
Logistic regression is a classification algorithm. It is used to predict a binary outcome based on a set of independent variables.
Per my comment, I think you could start by just calculating predictions directly from the coefficients. Here's an example that compares the output from predict.glm
to predicted probabilities calculated directly on the data:
# construct some data and model it
# y ~ x1 + x2
set.seed(1)
x1 <- runif(100)
x2 <- runif(100)
y <- rbinom(100,1,(x1+x2)/2)
data1 <- data.frame(x1=x1,x2=x2,y=y)
x3 <- runif(100)
x4 <- runif(100)
y2 <- rbinom(100,1,(x3+x4)/2)
data2 <- data.frame(x1=x3,x2=x4,y=y2)
glm1 <- glm(y~x1+x2,data=data1,family=binomial)
# extract coefs
#summary(glm1)
coef1 <- coef(glm1)
# calculate predicted probabilities for current data
tmp1 <- coef1[1] + (data1$x1*coef1[2]) + (data1$x2*coef1[3])
pr1 <- 1/(1+(1/exp(tmp1)))
# these match those from `predict`:
all.equal(pr1,predict(glm1,data1,type='response'))
# now apply to new data:
tmp2 <- coef1[1] + (data2$x1*coef1[2]) + (data2$x2*coef1[3])
pr2 <- 1/(1+(1/exp(tmp2)))
pr2
This is obviously not a general solution, nor does it properly handle uncertainty, but I think it's a better approach than hacking predict
.
You can create a glm fit with only an offset created from the coefficients that you have, then use the regular predict function with that. For example using the iris data (first fitting a model on the real data, then fitting a new model using dummy data and the coefficients from the first fit):
fit1 <- glm( I(Species=='versicolor') ~ Petal.Length + Petal.Width,
data=iris, family=binomial )
coef(fit1)
dummydata <- data.frame( Petal.Length = rnorm(10), Petal.Width=rnorm(10),
Species = rep(c('versicolor','other'), each=5) )
fit2 <- glm( I(Species=='versicolor') ~ 0 +
offset(-2.863708 + 1.563076*Petal.Length - 3.153165*Petal.Width),
data=dummydata, family=binomial )
pred1 <- predict(fit1, newdata=iris)
pred2 <- predict(fit2, newdata=iris)
plot(pred1,pred2)
abline(0,1, col='green')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With