Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

predict values from GAM for grouped dataframe in R

Tags:

r

apply

purrr

gam

I have a dataset of mean annual temperature values at different latitudes in different years. I want to use this to predict the latitude at which a given temperature could be found in a given year; i.e., "in 1980, at what latitude would the mean annual temperature have been 20C?"

I need to use year-specific models for this, because the relationship between latitude and temperature has changed over time (although not in the sample data below, which is randomly generated). This will involve:

  1. Fit GAMs to a dataset grouped/split by year.
  2. For each different GAM (that is, for each year), use predict.gam to calculate a predicted value for every element in a list of temperatures.
  3. Recombine these to get a dataframe with columns representing year, newdata_value (the temperature value used for prediction), and predicted_value (latitude from feeding each newdata_value into the year-specific GAM).

Here's a toy dataset:

years <- seq(1968, 2018, 1)
lat <- seq(34.5, 44.5, 1)
dat <- expand.grid(years, lat)
names(dat) <- c("years","lat")
dat$temp <- runif(dim(dat)[1], 5, 20) # add random temperature data points 
newdata_values <- seq(2, 16, 2) # temperature values to use for prediction

I've tried various purrr and split-apply-combine solutions and haven't figured anything out. Any suggestions?

like image 644
AFH Avatar asked Oct 23 '19 20:10

AFH


Video Answer


3 Answers

Another option is to fit a model that allows the lat/temp relationship to vary by year. There are several options for this. The following fits a model where each year has an independent relationship:

gam(lat ~ year + s(temp, by = year), data = dat)

Note that for this formulation year should be encoded as a factor.

An alternative would be to allow the lat/temp relationship to vary smoothly by year, a reasonable model if this relationship gradually changes over time. In this case you will want to use a tensor product smooth (te()) to indicate a two-way interaction between variables that are on different scales (degrees, years):

gam(lat ~ te(temp, year), data = dat)

In both cases you can then make a prediction with predict.gam(model, newdata = new_dat), where new_dat has both year and temp columns.

like image 173
Noam Ross Avatar answered Oct 06 '22 10:10

Noam Ross


One approach would be to use nested dataframes. I used code found in this tutorial.

You can group by year and use nest. I'll also rename columns and add the new values to predict:

library(tidyverse); library(mgcv)
names(dat) <- c('year', 'lat', 'temp')
dat2 <- dat %>% group_by(year) %>% nest()

dat2 <- dat2 %>% mutate(newdata_value = rep(list(newdata_values), n_distinct(year)))

You then define some helper functions to made the tidyverse code cleaner (I'm assuming your using gam from the mgcv package). Then map the model function to the data and map the predict function to the fitted models:

lat_gam <- function(df) {
  gam(lat ~ s(temp), data = df)
}

pred_gam <- function(mod) {
  predict.gam(mod, newdata = data.frame(temp = newdata_values))
}

dat2 <- dat2 %>% mutate(model = map(data, lat_gam))

dat2 <- dat2 %>% mutate(predicted_value = map(model, pred_gam))


dat2 %>% select(-data, -model) %>% unnest(cols = c(newdata_value, predicted_value))

The last line is totally optional, just gets the final output to print like the way you specified in 3)

like image 1
astrofunkswag Avatar answered Oct 06 '22 11:10

astrofunkswag


Here's a data.table approach:

library(data.table)
library(mgcv)

setDT(dat)

dat[, .(pred = c(predict.gam(gam(lat ~ temp), list(temp = newdata_values))),
        newdata_values),
    by = years]

The only problem I had was that the predict.gam(...) call returns an array. The c(predict.gam(...)) converts it to an array.

A similar base approach that does not have perfect formatting:

by(dat[, -1],
   dat[, 1],
   function(DF) {
     mod = gam(lat ~ temp, data = DF)
     pred = predict.gam(mod, list(temp = newdata_values))

     data.frame(newdata_values, pred)
     }
   )
like image 1
Cole Avatar answered Oct 06 '22 11:10

Cole