I'm currently working on a research involving different species of bats and fragmentation of the habitat. My dataset contains presence data (1 = present, 0 = absent) and data on the fragment sizes, body mass (both continuous) and feeding guilds (Feeding.Guild; categorical, 6 levels: carnivore, frugivore insectivore, nectarivore, omnivore and sanguinivore). The fragment sizes (logFrag) and body masses (logMass) are transformed using the natural log to conform to the normal distribution. I can't present the full data set (bats2) due to being classified.
To analyse this data, I use logistic regression. In R, this is the glm function with the binomial family.
bats2 <- read.csv("Data_StackExchange.csv",
quote = "", sep=";", dec = ".", header=T, row.names=NULL)
bats2$presence <- ifelse(bats2$Corrected.Abundance == 0, 0, 1)
bats2$logFrag <- log(bats2$FragSize)
bats2$logMass <- log(bats2$Mass)
str(bats2$Feeding.Guild)
Factor w/ 6 levels "carnivore","frugivore",..: 6 1 5 5 2 2 2 2 2 2 ...
levels(bats2$Feeding.Guild)
[1] "carnivore" "frugivore" "insectivore" "nectarivore" "omnivore" "sanguinivore"
regPresence <- glm(bats2$presence~(logFrag+logMass+Feeding.Guild),
family="binomial", data=bats2)
The results of this regression are obtained by the summary()
function and are as follows.
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.47240 0.64657 -6.917 4.61e-12 ***
logFrag 0.10448 0.03507 2.979 0.002892 **
logMass 0.39404 0.09620 4.096 4.20e-05 ***
Feeding.Guildfrugivore 3.36245 0.49378 6.810 9.78e-12 ***
Feeding.Guildinsectivore 1.97198 0.51136 3.856 0.000115 ***
Feeding.Guildnectarivore 3.85692 0.55379 6.965 3.29e-12 ***
Feeding.Guildomnivore 1.75081 0.51864 3.376 0.000736 ***
Feeding.Guildsanguinivore 1.73381 0.56881 3.048 0.002303 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
My first question is to verify that I interpret this data correctly: how to interpret this data correctly? I used this website to aid me in interpreting.
Additionally, I tried plotting this data in order to visualise it. However, when adding the facet_wrap function to make seperate plots for the different feeding guilds, the intercepts and slopes change compared to colouring the different feeding guilds in one plot. I used the following code:
Plot 1:
library(ggplot2)
qplot(logFrag, bats2$presence, colour=Feeding.Guild, data=bats2, se=F) +
geom_smooth(method = glm, family = "binomial", se=F, na.rm=T) + theme_bw()
Plot 2:
qplot(logFrag, bats2$presence, data=bats2, se=F) + facet_wrap(~Feeding.Guild,
scales="free") +
geom_smooth(method = glm, family = "binomial", se=F, na.rm=T) + theme_bw()
Resulting in the following images:
What causes these differences and which one would be the correct one?
Sample data set (part of data set which is not classified).
The resource you have linked to has an explanation of the interpretation in the bulleted section under the heading Using the logit model
. Estimate
is each covariate's additive effect on the log-odds of presence. This is per 1 unit increase of a continuous co-variate or per instance of the categorical. A couple of points on this:
FragSize
or Mass
in order to fit this model.logFrag
=0 and logMass
=0. These 0 values are impossible. This is usual, and why interpretation of the (Intercept)
is not useful for you.On to Std. Error
, this is a measure of your confidence in your Estimate
effects. People often use a normal approximation of +- 2*Std. Error
around the Estimate
to form confidence intervals and make statements using them. When the interval of +- 2*Std. Error
contains 0 there is some probability that the true effect is 0. You don't want that, so you're looking for small values of Std. Error
with respect to the Estimate
z value
and Pr(>|z|)
relate to the normal approximation I mentioned. You probably already know what a Z score (Standard Normal) is and how people use them to perform significance tests.
Now to your plots:
The plots aren't actually plotting your model. You are using the smoother to fit a new model of a similar type but over a different set of data. The smoother is only considering the effect of logFrag
to fit a mini logistic model within each guild.
So we expect the plots to differ from the summary()
, but not from eachother. The reason this has happened is interesting and it's to do with using bats2$presence
instead of presence
. When you pass in bats2$presence
, this effectively like passing ggplot2 a separate anonymous list of data. So long as that list aligns with the dataframe as you would expect, all is well. It seems that facet_wrap() mixes up the data when using bats2$presence
, probably due to sorting bats2
by guild. Use plain old presence
and they'll come out the same.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With