Unbalanced classification using RandomForestClassifier in sklearn

Tags:

I have a dataset where the classes are unbalanced. The classes are either '1' or '0' where the ratio of class '1':'0' is 5:1. How do you calculate the prediction error for each class and the rebalance weights accordingly in sklearn with Random Forest, kind of like in the following link: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#balance

937

asked Nov 19 '13 21:11

mlo

1 Answers

You can pass sample weights argument to Random Forest fit method

sample_weight : array-like, shape = [n_samples] or None

Sample weights. If None, then samples are equally weighted. Splits that would create child nodes with net zero or negative weight are ignored while searching for a split in each node. In the case of classification, splits are also ignored if they would result in any single class carrying a negative weight in either child node.

In older version there were a preprocessing.balance_weights method to generate balance weights for given samples, such that classes become uniformly distributed. It is still there, in internal but still usable preprocessing._weights module, but is deprecated and will be removed in future versions. Don't know exact reasons for this.

Update

Some clarification, as you seems to be confused. sample_weight usage is straightforward, once you remember that its purpose is to balance target classes in training dataset. That is, if you have X as observations and y as classes (labels), then len(X) == len(y) == len(sample_wight), and each element of sample witght 1-d array represent weight for a corresponding (observation, label) pair. For your case, if 1 class is represented 5 times as 0 class is, and you balance classes distributions, you could use simple

sample_weight = np.array([5 if i == 0 else 1 for i in y])

assigning weight of 5 to all 0 instances and weight of 1 to all 1 instances. See link above for a bit more crafty balance_weights weights evaluation function.

answered Oct 11 '22 13:10

alko

Related questions
                            
                                Tensorflow Different ways to Export and Run graph in C++
                            
                                Applying pandas qcut bins to new data
                            
                                concurrent.futures.ProcessPoolExecutor vs multiprocessing.pool.Pool [duplicate]
                            
                                Seaborn plots in a loop
                            
                                Use sklearn's GridSearchCV with a pipeline, preprocessing just once
                            
                                How to find out the arity of a method in Python
                            
                                Getting the root (head) of a DiGraph in networkx (Python)
                            
                                python 3.2 error saying urllib.parse.urlencode() is not defined
                            
                                Invert an axis in a matplotlib grafic
                            
                                How to mock nested functions?
                            
                                How can I test a Flask application which uses SQLAlchemy?
                            
                                When scale the data, why the train dataset use 'fit' and 'transform', but the test dataset only use 'transform'?
                            
                                Upload image in Flask
                            
                                No handles with labels found to put in legend
                            
                                Python difflib: highlighting differences inline?
                            
                                Recommended Python cryptographic module?
                            
                                BeautifulSoup: AttributeError: 'NavigableString' object has no attribute 'name'
                            
                                PEP8: conflict between W292 and W391
                            
                                Is it possible to read FTP files without writing them using Python?
                            
                                Python lambda closure scoping

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unbalanced classification using RandomForestClassifier in sklearn

Tags:

python

machine-learning

classification

scikit-learn

random-forest

mlo

People also ask

1 Answers

alko

Recent Activity

Donate For Us