I have a dataset of mean annual temperature values at different latitudes in different years. I want to use this to predict the latitude at which a given temperature could be found in a given year; i.e., "in 1980, at what latitude would the mean annual temperature have been 20C?"
I need to use year-specific models for this, because the relationship between latitude and temperature has changed over time (although not in the sample data below, which is randomly generated). This will involve:
predict.gam
to calculate a predicted value for every element in a list of temperatures. year
, newdata_value
(the temperature value used for prediction), and predicted_value
(latitude from feeding each newdata_value
into the year-specific GAM).Here's a toy dataset:
years <- seq(1968, 2018, 1)
lat <- seq(34.5, 44.5, 1)
dat <- expand.grid(years, lat)
names(dat) <- c("years","lat")
dat$temp <- runif(dim(dat)[1], 5, 20) # add random temperature data points
newdata_values <- seq(2, 16, 2) # temperature values to use for prediction
I've tried various purrr
and split-apply-combine
solutions and haven't figured anything out. Any suggestions?
Another option is to fit a model that allows the lat/temp relationship to vary by year. There are several options for this. The following fits a model where each year has an independent relationship:
gam(lat ~ year + s(temp, by = year), data = dat)
Note that for this formulation year
should be encoded as a factor.
An alternative would be to allow the lat/temp relationship to vary smoothly by year, a reasonable model if this relationship gradually changes over time. In this case you will want to use a tensor product smooth (te()
) to indicate a two-way interaction between variables that are on different scales (degrees, years):
gam(lat ~ te(temp, year), data = dat)
In both cases you can then make a prediction with predict.gam(model, newdata = new_dat)
, where new_dat
has both year
and temp
columns.
One approach would be to use nested dataframes. I used code found in this tutorial.
You can group by year and use nest
. I'll also rename columns and add the new values to predict:
library(tidyverse); library(mgcv)
names(dat) <- c('year', 'lat', 'temp')
dat2 <- dat %>% group_by(year) %>% nest()
dat2 <- dat2 %>% mutate(newdata_value = rep(list(newdata_values), n_distinct(year)))
You then define some helper functions to made the tidyverse code cleaner (I'm assuming your using gam from the mgcv
package). Then map the model function to the data and map the predict function to the fitted models:
lat_gam <- function(df) {
gam(lat ~ s(temp), data = df)
}
pred_gam <- function(mod) {
predict.gam(mod, newdata = data.frame(temp = newdata_values))
}
dat2 <- dat2 %>% mutate(model = map(data, lat_gam))
dat2 <- dat2 %>% mutate(predicted_value = map(model, pred_gam))
dat2 %>% select(-data, -model) %>% unnest(cols = c(newdata_value, predicted_value))
The last line is totally optional, just gets the final output to print like the way you specified in 3)
Here's a data.table approach:
library(data.table)
library(mgcv)
setDT(dat)
dat[, .(pred = c(predict.gam(gam(lat ~ temp), list(temp = newdata_values))),
newdata_values),
by = years]
The only problem I had was that the predict.gam(...)
call returns an array. The c(predict.gam(...))
converts it to an array.
A similar base approach that does not have perfect formatting:
by(dat[, -1],
dat[, 1],
function(DF) {
mod = gam(lat ~ temp, data = DF)
pred = predict.gam(mod, list(temp = newdata_values))
data.frame(newdata_values, pred)
}
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With