Why is training a random forest regressor with MAE criterion so slow compared to MSE?

Q: Is Random Forest slow?

The main limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions. In general, these algorithms are fast to train, but quite slow to create predictions once they are trained.

Q: What is MSE in random forest?

The two main parameters are mtry and ntree, the number of trees in the forest. We used the mean squared error (abbreviated MSE) as a measure of the prediction accuracy of the RF model. Two MSE error estimates are used in the validation procedure: the OOB error and the cross-validation error.

When training on even small applications (<50K rows <50 columns) using the mean absolute error criterion for sklearn's RandomForestRegress is nearly 10x slower than using mean squared error. To illustrate even on a small data set:

import time
from sklearn.ensemble import RandomForestRegressor
from sklearn.datasets import load_boston

X, y = load_boston(return_X_y=True)

def fit_rf_criteria(criterion, X=X, y=y):
    reg = RandomForestRegressor(n_estimators=100,
                                criterion=criterion,
                                n_jobs=-1,
                                random_state=1)
    start = time.time()
    reg.fit(X, y)
    end = time.time()
    print(end - start)

fit_rf_criteria('mse')  # 0.13266682624816895
fit_rf_criteria('mae')  # 1.26043701171875

Why does using the 'mae' criterion take so long for training a RandomForestRegressor? I want to optimize MAE for larger applications, but find the speed of the RandomForestRegressor tuned to this criterion prohibitively slow.

Is Random Forest slow?

The main limitation of random forest is that a large number of trees can make the algorithm too slow and ineffective for real-time predictions. In general, these algorithms are fast to train, but quite slow to create predictions once they are trained.

What is MSE in random forest?

The two main parameters are mtry and ntree, the number of trees in the forest. We used the mean squared error (abbreviated MSE) as a measure of the prediction accuracy of the RF model. Two MSE error estimates are used in the validation procedure: the OOB error and the cross-validation error.

Thank you @hellpanderr for sharing a reference to the project issue. To summarize – when the random forest regressor optimizes for MSE it optimizes for the L2-norm and a mean-based impurity metric. But when the regressor uses the MAE criterion it optimizes for the L1-norm which amounts to calculating the median. Unfortunately, sklearn's the regressor's implementation for MAE appears to take O(N^2) currently.

Why is training a random forest regressor with MAE criterion so slow compared to MSE?

Tags:

python

scikit-learn

random-forest

kevins_1

People also ask

1 Answers

kevins_1

Recent Activity

Donate For Us

Why is training a random forest regressor with MAE criterion so slow compared to MSE?

Tags:

python

scikit-learn

random-forest

kevins_1

People also ask

1 Answers

kevins_1

Related questions

Recent Activity

Donate For Us