Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Random Forest Regressor using a custom objective/ loss function (Python/ Sklearn)

I want to build a Random Forest Regressor to model count data (Poisson distribution). The default 'mse' loss function is not suited to this problem. Is there a way to define a custom loss function and pass it to the random forest regressor in Python (Sklearn, etc..)?

Is there any implementation to fit count data in Python in any packages?

like image 731
vishmay Avatar asked Mar 26 '18 14:03

vishmay


People also ask

Does random forest use loss function?

Random Forest Regression Model: Some of the important parameters are highlighted below: n_estimators — the number of decision trees you will be running in the model. criterion — this variable allows you to select the criterion (loss function) used to determine model outcomes.

What is n_estimators in random forest Regressor?

After reading the documentation for RandomForest Regressor you can see that n_estimators is the number of trees to be used in the forest. Since Random Forest is an ensemble method comprising of creating multiple decision trees, this parameter is used to control the number of trees to be used in the process.


1 Answers

In sklearn this is currently not supported. See discussion in the corresponding issue here, or this for another class, where they discuss reasons for that a bit more in detail (mainly the large computational overhead for calling a Python function).

So it could be done as discussed within the issues, by forking sklearn, implementing the cost function in Cython and then adding it to the list of available 'criterion'.

like image 145
Marcus V. Avatar answered Oct 17 '22 19:10

Marcus V.