Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Image classification (raster stack) with random forest (package ranger)

Tags:

r

raster

I'm fitting a random forest using the R package ranger to classify a raster image. The prediction function produces an error and hereafter I provide a reproducible example.

library(raster)
library(nnet)
library(ranger)
data(iris)

# put iris data into raster
r<-list()
for(i in 1:4){
  r[[i]]<-raster(nrows=10, ncols=15)
  r[[i]][]<-iris[,i]
}
r<-stack(r)
names(r)<-names(iris)[1:4]

# multinom (an example that works)
nn.model <- multinom(Species ~ ., data=iris, trace=F)
nn.pred<-predict(r,nn.model)

# ranger (doesn't work)
ranger.model<-ranger(Species ~ ., data=iris)   
ranger.pred<-predict(r,ranger.model)

The error given is

Error in v[cells, ] <- predv : incorrect number of subscripts on matrix

although the error with my real data is

Error in p[-naind, ] <- predv : number of items to replace is not a multiple of replacement length

The only thing that crosses my mind is that the ranger.prediction object includes several elements other than the predictions of interest. Anyway, how ranger could be used to predict on a raster stack?

like image 641
Hugo Avatar asked Sep 21 '17 22:09

Hugo


People also ask

Can random forest be used for image classification?

Random forests is a classification and regression algorithm originally designed for the machine learning community. This algorithm is increasingly being applied to satellite and aerial image classification and the creation of continuous fields data sets, such as, percent tree cover and biomass.

How do you select MTRY in random forest?

The number of variables selected at each split is denoted by mtry in randomforest function. Select mtry value with minimum out of bag(OOB) error. In this case, mtry = 4 is the best mtry as it has least OOB error. mtry = 4 was also used as default mtry.

How do you increase random forest accuracy in R?

Node size in Random Forest refers to the smallest node which can be split, so when you increase the node size , you will grow smalller trees, which means you will lose the previous predictive power. Increasing tree size works the other way, It should increase the accuracy.


1 Answers

Edit, 2021-07-15

There was a question about using clusterR, and I have found a more straightforward approach that what I first suggested. The new code does the same thing as the original, but in a simpler way and with an option for parallel processing:

# First train the ranger model

ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE  # This argument is needed for se
                       , keep.inbag = TRUE   # So is this one
                       )


# Create prediction function for clusterR

f_se <- function(model, ...) predict(model, ...)$se


# Predict se using clusterR
  
beginCluster(2)

map_se <- clusterR(r
                   , predict
                   , args = list(ranger.model
                                 , type = 'se'  # Remember to include this argument
                                 , fun = f_se
                                 )
                   )

endCluster()

Original answer, 2018-05-31

You can run predictions from a ranger model on a raster stack by training the model within the train function of the caret package:

library(caret)
ranger.model <- train(Species ~ ., data = iris, method = "ranger")  
ranger.pred <- predict(r, ranger.model)

However, this doesn't work if you want to predict the standard error, as the prediction function for train objects does not accept type = 'se'. I got around this by building a function for the purpose using this document:

https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf

# Function to predict standard errors on a raster
predfun <- function(x, model, type, filename)
{
  out <- raster(x)
  bs <- blockSize(out)
  out <- writeStart(out, filename, overwrite = TRUE)
  for (i in 1:bs$n) {
    v <- getValues(x, row = bs$row[i], nrows = bs$nrows[i])
    nas <- apply(v, 1, function(x) sum(is.na(x)))
    p <- numeric(length = nrow(v))
    p[nas > 0] <- NA
    p[nas == 0] <- predict(object = model,
                           v[nas == 0,],
                           type = 'se')$se
    out <- writeValues(out, p, bs$row[i])
  }
  out <- writeStop(out)
  return(out)
}

# New ranger model 
ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE
                       , keep.inbag  = TRUE
                       )
# Run predictions
se <- predfun(r
              , model = ranger.model
              , type  = "se"
              , filename = paste0(getwd(), "/se.tif")
              )
like image 50
ABMoeller Avatar answered Oct 15 '22 13:10

ABMoeller