I have a time series dataset for water temperature, air temperature and flow rate in a river. I have created a GAM model to predict water temperature based on air temp and flow. However I have not accounted for the autocorrelation in the datasets. Each data point within the predictors and dependent variable are not independent (i.e air temperature on day 2 is not independent of air temperature on day 1).
Can someone help me with the appropriate code to include some form of Autocorrelation measure (AR1?) within my model. As I understand it, I need to use the gamm()
function instead of the gam()
function?
My current model looks like this:
model <- gam(W.T.Mean ~ s(T.Mean) +s(Discharge), data = Pre_regulation_temp)
W.T.Mean is Mean daily water temperature. T.Mean is Mean daily air temperature. Discharge is Mean daily flow
Thanks in advance
Generalized Additive Models (GAMs) are an extension of Generalized Linear Models (GLMs) in such a way that predictor variables can be modeled non-parametrically in addition to linear and polynomial terms for other predictors.
There are many alternative packages. Examples include the R packages mboost , which implements a boosting approach; gss , which provides the full spline smoothing methods; VGAM which provides vector GAMs; and gamlss , which provides Generalized additive model for location, scale and shape.
AIC for GAMs Comparison of GAMs by a form of AIC is an alternative frequentist approach to model selection. Rather than using the marginal likelihood, the likelihood of the \mathbf{\beta}_j conditional upon \lambda_j is used, with the EDF replacing k , the number of model parameters.
The GAM framework is based on an appealing and simple mental model: Relationships between the individual predictors and the dependent variable follow smooth patterns that can be linear or nonlinear. We can estimate these smooth relationships simultaneously and then predict g(E(Y))) by simply adding them up.
You actually have several choices
gamm()
with correlation = corAR1(form = ~ time)
(where time
is the variable giving you the ordering in time of the evenly spaced observationsbam()
and specify a known value of rho
, the AR(1) parameter.That said, the issue for inference is that conditional upon the estimated model (i.e. effects of covariates) the response is independent and identically distributed. Put another way, we expect the residuals of the model to be independent (not autocorrelated). If the instantaneous (smooth) effect of air temperature on water temperature is sufficient to leave the model residuals independent then you do not necessarily need to do anything to correct the model.
However, if the estimated smooth effect of air temperature is quite wiggly, that might suggest that the estimated effect is being affected by autocorrelation in the data. I would expect a relatively simple relationship between air and water temperature, with saturating effects at both low and high ends & mdash; you can't make water go less than 0 but air temps can go well below, & likewise at the high end you don't get the same increase in water temperature for a unit increase in air temp. So check the estimated smooth to see if the effect is more complex than you would expect. If it is, you should try fitting with gamm()
and see how that changes the estimated smooth. If it doesn't make much difference, then I'd go back to my original gam()
and look at the autocorrelation function of the model residuals and if that shows a problem with autocorrelation, then you either need to correct it by adding terms to your gam()
model or switch back to gamm()
with correlation = ....
specified and use that for inference.
Other, more complex option, is to use the brms package, which can also estimate models with AR or ARMA correlation structrues.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With