Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unsupervised Random Forest Proximities in Python

I am currently re-visiting a random forests project I performed a few years back using the R-language, to:

  1. generate a proximity matrix of the data inputs using unsupervised RandomForest
  2. calculate the distance matrix from this proximity matrix and pass to Partitioning Around Medoids (PAM) clustering algorithm
  3. using the clusters obtained through PAM, run RandomForest in supervised mode to train a new model.
  4. Use this model to predict using another dataset from a future point in time.

I have shifted my workflow to Python for much of many projects as the language is very flexible and fun, but I am still getting my bearings in sklearn as compared to how I performed such tasks in R. My hangup is in producing a proximity matrix (or some container holding the proximity between samples), to be passed to PAM. I have found the following post, which describes a similar issue, but I have been unable to find a way to implement what the accepted answer's author suggests.

Any clues as to how to implement this? Any help is be greatly appreciated, and I will be sure to return that to the larger community. I know there are lots of other R to Python converts out there who would benefit from this sort of information.

Thanks in advance and apologies if this is a simple solution that I am simply overlooking.

like image 760
Michael Lindgren Avatar asked Nov 10 '22 03:11

Michael Lindgren


1 Answers

You can use bigrf package written in R. ( https://cran.r-project.org/web/packages/bigrf/bigrf.pdf ) It has whatever you need.

That is how you can implement it in R:

# load bigrf library
library('bigrf')

# generate synthetic dataset
synthetic.df <- generateSyntheticClass(x)

# create rf model
forest <- bigrfc(synthetic.df$x, synthetic.df$y, trace = 1)

# calculate distances
dist  <- proximities(forest, trace =  2)
dist  <- data.frame(as.matrix(dist))
dist  <- dist[1:nrow(x), 1:nrow(x)]
dist  <- sqrt(1 - dist)
like image 158
Soroosh Avatar answered Nov 14 '22 22:11

Soroosh