I am just wondering whether anyone has some R code that uses the package R2WinBUGS to run logistic regression - ideally with simulated data to generate the 'truth' and two continous co-variates.
Thanks.
Christian
PS:
Potential code to generate artificial data (one dimensional case) and run winbugs via r2winbugs (it does not work yet).
library(MASS)
library(R2WinBUGS)
setwd("d:/BayesianLogisticRegression")
n.site <- 150
X1<- sort(runif(n = n.site, min = -1, max =1))
xb <- 0.0 + 3.0*X1
occ.prob <- 1/(1+exp(-xb))
plot(X1, occ.prob,xlab="X1",ylab="occ.prob")
true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)
plot(X1, true.presence,xlab="X1",ylab="true.presence")
# combine data as data frame and save
data <- data.frame(X1, true.presence)
write.matrix(data, file = "data.txt", sep = "\t")
sink("model.txt")
cat("
model {
# Priors
alpha ~ dnorm(0,0.01)
beta ~ dnorm(0,0.01)
# Likelihood
for (i in 1:n) {
C[i] ~ dbin(p[i], N) # Note p before N
logit(p[i]) <- alpha + beta *X1[i]
}
}
",fill=TRUE)
sink()
# Bundle data
win.data <- list(mass = X1, n = length(X1))
# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}
# Parameters to estimate
params <- c("alpha", "beta")
# MCMC settings
nc <- 3 #Number of Chains
ni <- 1200 #Number of draws from posterior
nb <- 200 #Number of draws to discard as burn-in
nt <- 2 Thinning rate
# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params,
model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb,
n.iter=ni, debug = TRUE)
There are three main types of logistic regression: binary, multinomial and ordinal.
Logistic Regression is a significant machine learning algorithm because it has the ability to provide probabilities and classify new data using continuous and discrete datasets.
Yes...Prediction using Logistic Regression can be done for numerical variables.
Misclassification means the observed category is different from the underlying one and it is a form of measurement error in categorical data. The measurement error in continuous, especially normally distributed, data is well known and studied in the literature.
Your script is exactly the way to do it. It is almost working, it just required one simple change to make it work:
win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)
Which defines which data go to WinBugs. The variable C must be filled with true.presence, N must be 1 according to the data you generated - note that this is a special case of binomial distribution for N = 1, which is called Bernoulli - a simple "coin flip".
Here is the output:
> print(out, dig = 3)
Inference for Bugs model at "model.txt", fit using WinBUGS,
3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
n.sims = 1500 iterations saved
mean sd 2.5% 25% 50% 75% 97.5% Rhat n.eff
alpha -0.040 0.221 -0.465 -0.187 -0.037 0.114 0.390 1.001 1500
beta 3.177 0.478 2.302 2.845 3.159 3.481 4.165 1.000 1500
deviance 136.438 2.059 134.500 135.000 135.800 137.200 141.852 1.001 1500
as you see, the parameters correspond to the parameters used to generate the data. Also, if you compare with the frequentist solution, you see it corresponds.
EDIT: but the typical logistic (~ binomial) regression would measure some counts with some upper value N[i], and it would allow for different N[i] for each observation. For example say the proportion of juveniles to the whole population (N). This would require just to add index to N in your model:
C[i] ~ dbin(p[i], N[i])
The data generation would look something like:
N = rpois(n = n.site, lambda = 50)
juveniles <- rbinom(n = n.site, size = N, prob = occ.prob)
win.data <- list(X1 = X1, n = length(X1), C = juveniles, N = N)
(end of edit)
For more examples from population ecology see books of Marc Kéry (Introduction to WinBUGS for ecologist, and especially Bayesian Population Analysis using WinBUGS: A hierarchical perspective, which is a great book).
The complete script I used - the corrected script of yours is listed here (comparison with frequentist solution at the end):
#library(MASS)
library(R2WinBUGS)
#setwd("d:/BayesianLogisticRegression")
n.site <- 150
X1<- sort(runif(n = n.site, min = -1, max =1))
xb <- 0.0 + 3.0*X1
occ.prob <- 1/(1+exp(-xb)) # inverse logit
plot(X1, occ.prob,xlab="X1",ylab="occ.prob")
true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)
plot(X1, true.presence,xlab="X1",ylab="true.presence")
# combine data as data frame and save
data <- data.frame(X1, true.presence)
write.matrix(data, file = "data.txt", sep = "\t")
sink("tmp_bugs/model.txt")
cat("
model {
# Priors
alpha ~ dnorm(0,0.01)
beta ~ dnorm(0,0.01)
# Likelihood
for (i in 1:n) {
C[i] ~ dbin(p[i], N) # Note p before N
logit(p[i]) <- alpha + beta *X1[i]
}
}
",fill=TRUE)
sink()
# Bundle data
win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)
# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}
# Parameters to estimate
params <- c("alpha", "beta")
# MCMC settings
nc <- 3 #Number of Chains
ni <- 1200 #Number of draws from posterior
nb <- 200 #Number of draws to discard as burn-in
nt <- 2 #Thinning rate
# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params,
model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb,
n.iter=ni,
working.directory = paste(getwd(), "/tmp_bugs/", sep = ""),
debug = TRUE)
print(out, dig = 3)
# Frequentist approach for comparison
m1 = glm(true.presence ~ X1, family = binomial)
summary(m1)
# normally, you should do it this way, but the above also works:
#m2 = glm(cbind(true.presence, 1 - true.presence) ~ X1, family = binomial)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With