I am currently re-visiting a random forests project I performed a few years back using the R-language, to:
I have shifted my workflow to Python for much of many projects as the language is very flexible and fun, but I am still getting my bearings in sklearn as compared to how I performed such tasks in R. My hangup is in producing a proximity matrix (or some container holding the proximity between samples), to be passed to PAM. I have found the following post, which describes a similar issue, but I have been unable to find a way to implement what the accepted answer's author suggests.
Any clues as to how to implement this? Any help is be greatly appreciated, and I will be sure to return that to the larger community. I know there are lots of other R to Python converts out there who would benefit from this sort of information.
Thanks in advance and apologies if this is a simple solution that I am simply overlooking.
You can use bigrf package written in R. ( https://cran.r-project.org/web/packages/bigrf/bigrf.pdf ) It has whatever you need.
That is how you can implement it in R:
# load bigrf library
library('bigrf')
# generate synthetic dataset
synthetic.df <- generateSyntheticClass(x)
# create rf model
forest <- bigrfc(synthetic.df$x, synthetic.df$y, trace = 1)
# calculate distances
dist <- proximities(forest, trace = 2)
dist <- data.frame(as.matrix(dist))
dist <- dist[1:nrow(x), 1:nrow(x)]
dist <- sqrt(1 - dist)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With