Balanced Random Forest in scikit-learn (python)

Tags:

I'm wondering if there is an implementation of the Balanced Random Forest (BRF) in recent versions of the scikit-learn package. BRF is used in the case of imbalanced data. It works as normal RF, but for each bootstrapping iteration, it balances the prevalence class by undersampling. For example, given two classes N0 = 100, and N1 = 30 instances, at each random sampling it draws (with replacement) 30 instances from the first class and the same amount of instances from the second class, i.e. it trains a tree on a balanced data set. For more information please refer to this paper.

RandomForestClassifier() does have the 'class_weight=' parameter, which might be set to 'balanced', but I'm not sure that it is related to downsampling of the bootsrapped training samples.

563

asked Nov 12 '16 17:11

Arnold Klein

1 Answers

What you're looking for is the BalancedBaggingClassifier from imblearn.

imblearn.ensemble.BalancedBaggingClassifier(base_estimator=None,
 n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True,
 bootstrap_features=False, oob_score=False, warm_start=False, ratio='auto',
 replacement=False, n_jobs=1, random_state=None, verbose=0)

Effectively what it allow you to do is to successively undersample your majority class while fitting an estimator on top. You can use random forest or any base estimator from scikit-learn. Here is an example.

131

answered Sep 28 '22 17:09

mamafoku

Related questions
                            
                                Is there a keras method to split data?
                            
                                inputs for nDCG in sklearn
                            
                                Saving an sklearn `FunctionTransformer` with the function it wraps
                            
                                predict_proba or decision_function as estimator "confidence"
                            
                                Comparison of R, statmodels, sklearn for a classification task with logistic regression
                            
                                Creating a threshold-coded ROC plot in Python
                            
                                Nu is infeasible
                            
                                Python loading old version of sklearn
                            
                                Cross-validation for grouped time-series (panel) data
                            
                                sklearn ImportError: No module named _check_build
                            
                                Calibration with xgboost
                            
                                Getting topic-word distribution from LDA in scikit learn
                            
                                Getting "ModuleNotFoundError: No module named 'sklearn.impute'" despite having latest sklearn installed (0.19.1)
                            
                                ImportError when importing metric from sklearn
                            
                                Ignore a column while building a model with SKLearn
                            
                                python warnings.filterwarnings does not ignore DeprecationWarning from 'import sklearn.ensemble'
                            
                                Computing separate tfidf scores for two different columns using sklearn
                            
                                Perform Chi-2 feature selection on TF and TF*IDF vectors
                            
                                scikit-learn roc_curve: why does it return a threshold value = 2 some time?
                            
                                python - TypeError: unorderable types: str() > float()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Balanced Random Forest in scikit-learn (python)

Tags:

classification

scikit-learn

random-forest

Arnold Klein

People also ask

1 Answers

mamafoku

Recent Activity

Donate For Us