Class weights vs under/oversampling

Tags:

In imbalanced classification (with scikit-learn) what would be the difference of balancing classes (i.e. set class_weight to balanced) to oversampling with SMOTE for example? What would be the expected effects of one vs the other?

868

asked Apr 12 '19 18:04

Mario L

1 Answers

Class weights directly modify the loss function by giving more (or less) penalty to the classes with more (or less) weight. In effect, one is basically sacrificing some ability to predict the lower weight class (the majority class for unbalanced datasets) by purposely biasing the model to favor more accurate predictions of the higher weighted class (the minority class).

Oversampling and undersampling methods essentially give more weight to particular classes as well (duplicating observations duplicates the penalty for those particular observations, giving them more influence in the model fit), but due to data splitting that typically takes place in training this will yield slightly different results as well.

Please refer to https://datascience.stackexchange.com/questions/52627/why-class-weight-is-outperforming-oversampling

answered Oct 02 '22 02:10

Constanza Garcia

Related questions
                            
                                How to use Refresh Token Google API in Python? [duplicate]
                            
                                How do I search for a Tag in xml file using ElementTree where i have a certain "Parent"tag with a specific value? (python)
                            
                                Is there a way to get a webpage's Network activity (which you can see on Chrome Dev Tools) on load via Python?
                            
                                aiohttp asyncio.TimeoutError from None using ClientSession
                            
                                How to restart a process using python multiprocessing module
                            
                                Python Fabric Sudo su - user
                            
                                How to pass a python variables to shell script in azure databricks notebookbles.?
                            
                                Tensorflow.keras.layers "unresolved reference" in pycharm
                            
                                Python equivalent of R "here" package
                            
                                Python run_in_executor and forget?
                            
                                Tensorflow Object Detection - Convert .pb file to tflite
                            
                                Pickling of a namedtuple instance succeeds normally, but fails when module is Cythonized
                            
                                Most efficient way to sort an array into bins specified by an index array?
                            
                                Finding Teeth of Gear by python opencv
                            
                                Converting an Integer value to base64, and then decoding it to get a plaintext
                            
                                How to assign a value to a column for every row of pandas dataframe? [duplicate]
                            
                                How to run python code on AWS lambda with package dependencies >500MB?
                            
                                How do I Sample each group from a pandas data frame at different rates
                            
                                Why is TimeDistributed not needed in my Keras LSTM?
                            
                                Intelligent Peak Detection Method

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Class weights vs under/oversampling

Tags:

python

classification

scikit-learn

imblearn

Mario L

People also ask

1 Answers

Constanza Garcia

Recent Activity

Donate For Us