I'm fitting a random forest using the R package ranger to classify a raster image. The prediction function produces an error and hereafter I provide a reproducible example.
library(raster)
library(nnet)
library(ranger)
data(iris)
# put iris data into raster
r<-list()
for(i in 1:4){
r[[i]]<-raster(nrows=10, ncols=15)
r[[i]][]<-iris[,i]
}
r<-stack(r)
names(r)<-names(iris)[1:4]
# multinom (an example that works)
nn.model <- multinom(Species ~ ., data=iris, trace=F)
nn.pred<-predict(r,nn.model)
# ranger (doesn't work)
ranger.model<-ranger(Species ~ ., data=iris)
ranger.pred<-predict(r,ranger.model)
The error given is
Error in v[cells, ] <- predv : incorrect number of subscripts on matrix
although the error with my real data is
Error in p[-naind, ] <- predv : number of items to replace is not a multiple of replacement length
The only thing that crosses my mind is that the ranger.prediction object includes several elements other than the predictions of interest. Anyway, how ranger could be used to predict on a raster stack?
Random forests is a classification and regression algorithm originally designed for the machine learning community. This algorithm is increasingly being applied to satellite and aerial image classification and the creation of continuous fields data sets, such as, percent tree cover and biomass.
The number of variables selected at each split is denoted by mtry in randomforest function. Select mtry value with minimum out of bag(OOB) error. In this case, mtry = 4 is the best mtry as it has least OOB error. mtry = 4 was also used as default mtry.
Node size in Random Forest refers to the smallest node which can be split, so when you increase the node size , you will grow smalller trees, which means you will lose the previous predictive power. Increasing tree size works the other way, It should increase the accuracy.
Edit, 2021-07-15
There was a question about using clusterR
, and I have found a more straightforward approach that what I first suggested. The new code does the same thing as the original, but in a simpler way and with an option for parallel processing:
# First train the ranger model
ranger.model <- ranger(Species ~ .
, data = iris
, probability = TRUE # This argument is needed for se
, keep.inbag = TRUE # So is this one
)
# Create prediction function for clusterR
f_se <- function(model, ...) predict(model, ...)$se
# Predict se using clusterR
beginCluster(2)
map_se <- clusterR(r
, predict
, args = list(ranger.model
, type = 'se' # Remember to include this argument
, fun = f_se
)
)
endCluster()
Original answer, 2018-05-31
You can run predictions from a ranger model on a raster stack by training the model within the train function of the caret package:
library(caret)
ranger.model <- train(Species ~ ., data = iris, method = "ranger")
ranger.pred <- predict(r, ranger.model)
However, this doesn't work if you want to predict the standard error, as the prediction function for train objects does not accept type = 'se'
. I got around this by building a function for the purpose using this document:
https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf
# Function to predict standard errors on a raster
predfun <- function(x, model, type, filename)
{
out <- raster(x)
bs <- blockSize(out)
out <- writeStart(out, filename, overwrite = TRUE)
for (i in 1:bs$n) {
v <- getValues(x, row = bs$row[i], nrows = bs$nrows[i])
nas <- apply(v, 1, function(x) sum(is.na(x)))
p <- numeric(length = nrow(v))
p[nas > 0] <- NA
p[nas == 0] <- predict(object = model,
v[nas == 0,],
type = 'se')$se
out <- writeValues(out, p, bs$row[i])
}
out <- writeStop(out)
return(out)
}
# New ranger model
ranger.model <- ranger(Species ~ .
, data = iris
, probability = TRUE
, keep.inbag = TRUE
)
# Run predictions
se <- predfun(r
, model = ranger.model
, type = "se"
, filename = paste0(getwd(), "/se.tif")
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With