Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenCV Iterative random forest training

I'm using the random forest algorithm as the classifier of my thesis project. The training set consists of thousands of images, and for each image about 2000 pixels get sampled. For each pixel, I've hundred of thousands of features. With my current hardware limitations (8G of ram, possibly extendable to 16G) I'm able to fit in memory the samples (i.e. features per pixel) for only one image. My questions is: is it possible to call multiple times the train method, each time with a different image's samples, and get the statistical model automatically updated at each call? I'm particularly interested in the variable importance since, after I train the full training set with the whole features set, my idea is to reduce the number of features from hundred of thousands to about 2000, keeping only the most important ones.

Thank you for any advice, Daniele

like image 756
mUogoro Avatar asked Nov 05 '12 14:11

mUogoro


2 Answers

I dont think the algorithm supports incremental training. You could consider reducing the size of your descriptors prior to training, using other feature reduction method. Or estimate the variable importance on a random subset of pixels taken among all your training images, as much as you can stuff into your memory...

like image 79
remi Avatar answered Oct 09 '22 23:10

remi


See my answer to this post. There are incremental versions of random forests, and they will let you train on much larger data.

like image 40
killogre Avatar answered Oct 10 '22 01:10

killogre