Proximity Matrix in sklearn.ensemble.RandomForestClassifier

Tags:

I'm trying to perform clustering in Python using Random Forests. In the R implementation of Random Forests, there is a flag you can set to get the proximity matrix. I can't seem to find anything similar in the python scikit version of Random Forest. Does anyone know if there is an equivalent calculation for the python version?

388

asked Sep 09 '13 16:09

WtLgi

1 Answers

Based on Gilles Louppe answer I have written a function. I don't know if it is effective, but it works. Best regards.

def proximityMatrix(model, X, normalize=True):      

    terminals = model.apply(X)
    nTrees = terminals.shape[1]

    a = terminals[:,0]
    proxMat = 1*np.equal.outer(a, a)

    for i in range(1, nTrees):
        a = terminals[:,i]
        proxMat += 1*np.equal.outer(a, a)

    if normalize:
        proxMat = proxMat / nTrees

    return proxMat   

from sklearn.ensemble import  RandomForestClassifier
from sklearn.datasets import load_breast_cancer
train = load_breast_cancer()

model = RandomForestClassifier(n_estimators=500, max_features=2, min_samples_leaf=40)
model.fit(train.data, train.target)
proximityMatrix(model, train.data, normalize=True)
## array([[ 1.   ,  0.414,  0.77 , ...,  0.146,  0.79 ,  0.002],
##        [ 0.414,  1.   ,  0.362, ...,  0.334,  0.296,  0.008],
##        [ 0.77 ,  0.362,  1.   , ...,  0.218,  0.856,  0.   ],
##        ..., 
##        [ 0.146,  0.334,  0.218, ...,  1.   ,  0.21 ,  0.028],
##        [ 0.79 ,  0.296,  0.856, ...,  0.21 ,  1.   ,  0.   ],
##        [ 0.002,  0.008,  0.   , ...,  0.028,  0.   ,  1.   ]])

185

answered Sep 21 '22 11:09

Vyga

Related questions
                            
                                Why is taking the mod of a number in python faster with exponents?
                            
                                Changing string to byte type in Python 2.7
                            
                                How do you manage a temporary directory such that it is guaranteed to be deleted on program close?
                            
                                Equivalent for pop on strings
                            
                                Python: Socket and threads?
                            
                                Python concatenate list
                            
                                Python 3.3 source code setup: modules were not found: _lzma _sqlite3 _tkinter
                            
                                Import Error: No module named AppKit
                            
                                Uploading files using requests and send extra data
                            
                                Celery task with multiple decorators not auto registering task name
                            
                                Matching multiple regex patterns with the alternation operator?
                            
                                Running a Python script within shell script - Check status
                            
                                Python : UnicodeEncodeError when I use grep
                            
                                How do I get the most recent Cloudwatch metric data for an instance using Boto?
                            
                                Print chosen worksheets in excel files to pdf in python
                            
                                Python list equivalent in C++?
                            
                                Python: invalid literal for int() with base 10: '808.666666666667'
                            
                                ImportError: No module named gi.repository Mac OS X
                            
                                Why doesn't .rstrip('\n') work?
                            
                                Mask a circular sector in a numpy array

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Proximity Matrix in sklearn.ensemble.RandomForestClassifier

Tags:

python

scikit-learn

random-forest

WtLgi

People also ask

1 Answers

Vyga

Recent Activity

Donate For Us