When I'm running random forest model over my test data I'm getting different results for the same data set + model. Here are the results where you can see the difference over the first column: <pre class="prettyprint"><code>> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR) FALSE TRUE FALSE 14 7 TRUE 13 66 > table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR) FALSE TRUE FALSE 15 7 TRUE 12 66 </code></pre> Although the difference is very small, I'm trying to understand what caused that. I'm guessing that <code>predict</code> has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right? Thank you in advance

I will assume that you did not refit the model here, but it is simply the <code>predict</code> call that is producing these results. The answer is probably this, from <code>?predict.randomForest</code>: <blockquote> Any ties are broken at random, so if this is undesirable, avoid it by using odd number ntree in randomForest() </blockquote>

Different results using Random Forest prediction in R

Tags:

r

random-forest

predict

When I'm running random forest model over my test data I'm getting different results for the same data set + model.

Here are the results where you can see the difference over the first column:

> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)

        FALSE TRUE
 FALSE    14    7
 TRUE     13   66

> table((predict(rfModelsL[[1]],newdata = a)) ,a$earlyR)

        FALSE TRUE
 FALSE    15    7
 TRUE     12   66

Although the difference is very small, I'm trying to understand what caused that. I'm guessing that predict has "flexible" classification threshold, although I couldn't find that in the documentation; Am I right?

Thank you in advance

353

asked Jan 24 '17 13:01

staove7

1 Answers

I will assume that you did not refit the model here, but it is simply the predict call that is producing these results. The answer is probably this, from ?predict.randomForest:

Any ties are broken at random, so if this is undesirable, avoid it by using odd number ntree in randomForest()

answered Oct 06 '22 06:10

mpjdem

Related questions
                            
                                What is the "effects" returned by `aov` and `lm`?
                            
                                Dendrogram with Corrplot (R)
                            
                                Adding a table to ggplot figure
                            
                                Collapsible box in Shiny App
                            
                                R:Plotly - Creating Multiple boxplots in one graph as a group
                            
                                Merging Two SpatialPolygonsDataFrame Objects
                            
                                Automatically finding the path of current R project in R Studio
                            
                                How to include DiagrammeR/mermaid flowchart in a Rmarkdown file
                            
                                Why does my R uses all CPU cores when running functions like step()?
                            
                                Update sliderInput in Shiny reactively
                            
                                R: Getting AIC/BIC/Likelihood from GLMNet
                            
                                Using `map()` in nested data frame
                            
                                Pandoc while using R markdown with a cron command
                            
                                Extracting data from a specific position of a PDF?
                            
                                Best way to get list of SNPs by gene id?
                            
                                Adding group mean lines to geom_bar plot and including in legend
                            
                                lubridate: how to convert difftime to millisecond units (and plot it)?
                            
                                rhandsontable drop-down menus cut short in Shiny app
                            
                                R microbenchmark warning - Could not measure a positive execution time for x evaluations
                            
                                How can i control bin intervals in ggplot2?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With