I am using the randomForest package in R, which allows to calculate the proximity matrix (P). In the description of the package it describes the parameter as: "if proximity=TRUE when randomForest is called, a matrix of proximity measures among the input (based on the frequency that pairs of data points are in the same terminal nodes)."
I obtain the proximity matrix of a random forest as follows:
P <- randomForest(x, y, ntree = 1000, proximity=TRUE)$proximity
When I investigate the P matrix, I see values like P(i,j)=0.971014493 where i and j are two data instances within my training data set (x). Such a value does not make sense, because when it is multplied by 1000 (number of trees in the forest), the resulting number is not an integer, hence "frequency". Could someone please help me understand, why do I get such real numbers in the proximity matrix?
The term "proximity" means the "closeness" or "nearness" between pairs of cases. Proximities are calculated for each pair of cases/observations/sample points. If two cases occupy the same terminal node through one tree, their proximity is increased by one.
The proximity between two samples is calculated by measuring the number of times that these two samples are placed in the same terminal node of the same tree of RF, divided by the number of trees in the forest.
Proximities are used in replacing missing data, locating outliers, and producing illuminating low-dimensional views of the data.
Because just as with the default predictions, the default proximity is calculated only using the trees where neither observation was included in the sample used to build that tree (they were "out-of-bag").
The number of times this happens will vary slightly for each pair of cases, and certainly won't be a nice round number like 1000.
You'll note that the very next parameter listed after proximity
is called oob.prox
indicating whether to use only out of bag pairs (the default) or use each and every tree.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With