I'm fitting a random forest using the R package ranger to classify a raster image. The prediction function produces an error and hereafter I provide a reproducible example. <pre class="prettyprint"><code>library(raster) library(nnet) library(ranger) data(iris) # put iris data into raster r<-list() for(i in 1:4){ r[[i]]<-raster(nrows=10, ncols=15) r[[i]][]<-iris[,i] } r<-stack(r) names(r)<-names(iris)[1:4] # multinom (an example that works) nn.model <- multinom(Species ~ ., data=iris, trace=F) nn.pred<-predict(r,nn.model) # ranger (doesn't work) ranger.model<-ranger(Species ~ ., data=iris) ranger.pred<-predict(r,ranger.model) </code></pre> The error given is <blockquote> Error in v[cells, ] <- predv : incorrect number of subscripts on matrix </blockquote> although the error with my real data is <blockquote> Error in p[-naind, ] <- predv : number of items to replace is not a multiple of replacement length </blockquote> The only thing that crosses my mind is that the ranger.prediction object includes several elements other than the predictions of interest. Anyway, how ranger could be used to predict on a raster stack?

Edit, 2021-07-15 There was a question about using <code>clusterR</code>, and I have found a more straightforward approach that what I first suggested. The new code does the same thing as the original, but in a simpler way and with an option for parallel processing: <pre class="prettyprint"><code># First train the ranger model ranger.model <- ranger(Species ~ . , data = iris , probability = TRUE # This argument is needed for se , keep.inbag = TRUE # So is this one ) # Create prediction function for clusterR f_se <- function(model, ...) predict(model, ...)$se # Predict se using clusterR beginCluster(2) map_se <- clusterR(r , predict , args = list(ranger.model , type = 'se' # Remember to include this argument , fun = f_se ) ) endCluster() </code></pre> Original answer, 2018-05-31 You can run predictions from a ranger model on a raster stack by training the model within the train function of the caret package: <pre class="prettyprint"><code>library(caret) ranger.model <- train(Species ~ ., data = iris, method = "ranger") ranger.pred <- predict(r, ranger.model) </code></pre> However, this doesn't work if you want to predict the standard error, as the prediction function for train objects does not accept <code>type = 'se'</code>. I got around this by building a function for the purpose using this document: https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf <pre class="prettyprint"><code># Function to predict standard errors on a raster predfun <- function(x, model, type, filename) { out <- raster(x) bs <- blockSize(out) out <- writeStart(out, filename, overwrite = TRUE) for (i in 1:bs$n) { v <- getValues(x, row = bs$row[i], nrows = bs$nrows[i]) nas <- apply(v, 1, function(x) sum(is.na(x))) p <- numeric(length = nrow(v)) p[nas > 0] <- NA p[nas == 0] <- predict(object = model, v[nas == 0,], type = 'se')$se out <- writeValues(out, p, bs$row[i]) } out <- writeStop(out) return(out) } # New ranger model ranger.model <- ranger(Species ~ . , data = iris , probability = TRUE , keep.inbag = TRUE ) # Run predictions se <- predfun(r , model = ranger.model , type = "se" , filename = paste0(getwd(), "/se.tif") ) </code></pre>

Image classification (raster stack) with random forest (package ranger)

Q: Can random forest be used for image classification?

Random forests is a classification and regression algorithm originally designed for the machine learning community. This algorithm is increasingly being applied to satellite and aerial image classification and the creation of continuous fields data sets, such as, percent tree cover and biomass.

Q: How do you select MTRY in random forest?

The number of variables selected at each split is denoted by mtry in randomforest function. Select mtry value with minimum out of bag(OOB) error. In this case, mtry = 4 is the best mtry as it has least OOB error. mtry = 4 was also used as default mtry.

Q: How do you increase random forest accuracy in R?

Node size in Random Forest refers to the smallest node which can be split, so when you increase the node size , you will grow smalller trees, which means you will lose the previous predictive power. Increasing tree size works the other way, It should increase the accuracy.

Tags:

r

raster

I'm fitting a random forest using the R package ranger to classify a raster image. The prediction function produces an error and hereafter I provide a reproducible example.

library(raster)
library(nnet)
library(ranger)
data(iris)

# put iris data into raster
r<-list()
for(i in 1:4){
  r[[i]]<-raster(nrows=10, ncols=15)
  r[[i]][]<-iris[,i]
}
r<-stack(r)
names(r)<-names(iris)[1:4]

# multinom (an example that works)
nn.model <- multinom(Species ~ ., data=iris, trace=F)
nn.pred<-predict(r,nn.model)

# ranger (doesn't work)
ranger.model<-ranger(Species ~ ., data=iris)   
ranger.pred<-predict(r,ranger.model)

The error given is

Error in v[cells, ] <- predv : incorrect number of subscripts on matrix

although the error with my real data is

Error in p[-naind, ] <- predv : number of items to replace is not a multiple of replacement length

The only thing that crosses my mind is that the ranger.prediction object includes several elements other than the predictions of interest. Anyway, how ranger could be used to predict on a raster stack?

641

asked Sep 21 '17 22:09

Hugo

1 Answers

Edit, 2021-07-15

There was a question about using clusterR, and I have found a more straightforward approach that what I first suggested. The new code does the same thing as the original, but in a simpler way and with an option for parallel processing:

# First train the ranger model

ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE  # This argument is needed for se
                       , keep.inbag = TRUE   # So is this one
                       )


# Create prediction function for clusterR

f_se <- function(model, ...) predict(model, ...)$se


# Predict se using clusterR
  
beginCluster(2)

map_se <- clusterR(r
                   , predict
                   , args = list(ranger.model
                                 , type = 'se'  # Remember to include this argument
                                 , fun = f_se
                                 )
                   )

endCluster()

Original answer, 2018-05-31

You can run predictions from a ranger model on a raster stack by training the model within the train function of the caret package:

library(caret)
ranger.model <- train(Species ~ ., data = iris, method = "ranger")  
ranger.pred <- predict(r, ranger.model)

However, this doesn't work if you want to predict the standard error, as the prediction function for train objects does not accept type = 'se'. I got around this by building a function for the purpose using this document:

https://cran.r-project.org/web/packages/raster/vignettes/functions.pdf

# Function to predict standard errors on a raster
predfun <- function(x, model, type, filename)
{
  out <- raster(x)
  bs <- blockSize(out)
  out <- writeStart(out, filename, overwrite = TRUE)
  for (i in 1:bs$n) {
    v <- getValues(x, row = bs$row[i], nrows = bs$nrows[i])
    nas <- apply(v, 1, function(x) sum(is.na(x)))
    p <- numeric(length = nrow(v))
    p[nas > 0] <- NA
    p[nas == 0] <- predict(object = model,
                           v[nas == 0,],
                           type = 'se')$se
    out <- writeValues(out, p, bs$row[i])
  }
  out <- writeStop(out)
  return(out)
}

# New ranger model 
ranger.model <- ranger(Species ~ .
                       , data = iris
                       , probability = TRUE
                       , keep.inbag  = TRUE
                       )
# Run predictions
se <- predfun(r
              , model = ranger.model
              , type  = "se"
              , filename = paste0(getwd(), "/se.tif")
              )

answered Oct 15 '22 13:10

ABMoeller

Related questions
                            
                                Display Images from file in R Jupyter notebook
                            
                                Reactives: invalid (NULL) left side of assignment
                            
                                dplyr: grouping and summarizing/mutating data with rolling time windows
                            
                                Turning relationship data into hierarchical list in R
                            
                                Creating a function with an argument passed to dplyr::filter what is the best way to work around nse?
                            
                                How to create data table from vector with named values and keep the names?
                            
                                R: plot histogram of all columns in a data.frame
                            
                                Add legend to a plot with R
                            
                                Convert JSON file to a CSV file using R
                            
                                How does ggplot calculate its default breaks?
                            
                                R Shiny: How to write loop for observeEvent
                            
                                Too many SQL Server users
                            
                                R: Using marrangeGrob to make pdf results in blank first page
                            
                                Melt using patterns when variable names contain string information - avoid coercion to numeric
                            
                                Efficient implementation in computing pairwise differences
                            
                                Adjusting figure margins in Rmarkdown
                            
                                Minimum Cost Flow - network optimization in R
                            
                                Select columns based on multiple attribute conditions
                            
                                Conditional filtering using tidyverse
                            
                                Error with select function from dplyr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With