I'm trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can't seem to find anything similar in the python scikit version of Random Forest. Does anyone know if there is an equivalent calculation for the python version?
Proximities are calculated for each pair of cases/observations/sample points. If two cases occupy the same terminal node through one tree, their proximity is increased by one. At the end of the run of all trees, the proximities are normalized by dividing by the number of trees.
Once Random F orest has b een t rained, the p roximity matrix q uantifies s ample- similarity. The proximity between two samples is calculated by measuring the number of times that these two samples are placed in the same terminal node of the same tree of RF, divided by the number of trees in the forest.
The sub-sample size is controlled with the max_samples parameter if bootstrap=True (default), otherwise the whole dataset is used to build each tree. Read more in the User Guide. The number of trees in the forest. Changed in version 0.22: The default value of n_estimators changed from 10 to 100 in 0.22.
Based on Gilles Louppe answer I have written a function. I don't know if it is effective, but it works. Best regards.
def proximityMatrix(model, X, normalize=True):
terminals = model.apply(X)
nTrees = terminals.shape[1]
a = terminals[:,0]
proxMat = 1*np.equal.outer(a, a)
for i in range(1, nTrees):
a = terminals[:,i]
proxMat += 1*np.equal.outer(a, a)
if normalize:
proxMat = proxMat / nTrees
return proxMat
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
train = load_breast_cancer()
model = RandomForestClassifier(n_estimators=500, max_features=2, min_samples_leaf=40)
model.fit(train.data, train.target)
proximityMatrix(model, train.data, normalize=True)
## array([[ 1. , 0.414, 0.77 , ..., 0.146, 0.79 , 0.002],
## [ 0.414, 1. , 0.362, ..., 0.334, 0.296, 0.008],
## [ 0.77 , 0.362, 1. , ..., 0.218, 0.856, 0. ],
## ...,
## [ 0.146, 0.334, 0.218, ..., 1. , 0.21 , 0.028],
## [ 0.79 , 0.296, 0.856, ..., 0.21 , 1. , 0. ],
## [ 0.002, 0.008, 0. , ..., 0.028, 0. , 1. ]])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With