I'm solving a classification problem with sklearn's logistic regression in python. My problem is a general/generic one. I have a dataset with two classes/result (positive/negative or 1/0), but the set is highly unbalanced. There are ~5% positives and ~95% negatives. I know there are a number of ways to deal with an unbalanced problem like this, but have not found a good explanation of how to implement properly using the sklearn package. What I've done thus far is to build a balanced training set by selecting entries with a positive outcome and an equal number of randomly selected negative entries. I can then train the model to this set, but I'm stuck with how to modify the model to then work on the original unbalanced population/set. What are the specific steps to do this? I've poured over the sklearn documentation and examples and haven't found a good explanation.

Have you tried to pass to your <code>class_weight="auto"</code> classifier? Not all classifiers in sklearn support this, but some do. Check the docstrings. Also you can rebalance your dataset by randomly dropping negative examples and / or over-sampling positive examples (+ potentially adding some slight gaussian feature noise).

sklearn logistic regression with unbalanced classes

Tags:

I'm solving a classification problem with sklearn's logistic regression in python.

My problem is a general/generic one. I have a dataset with two classes/result (positive/negative or 1/0), but the set is highly unbalanced. There are ~5% positives and ~95% negatives.

I know there are a number of ways to deal with an unbalanced problem like this, but have not found a good explanation of how to implement properly using the sklearn package.

What I've done thus far is to build a balanced training set by selecting entries with a positive outcome and an equal number of randomly selected negative entries. I can then train the model to this set, but I'm stuck with how to modify the model to then work on the original unbalanced population/set.

What are the specific steps to do this? I've poured over the sklearn documentation and examples and haven't found a good explanation.

632

asked Feb 13 '13 21:02

agentscully

1 Answers

Have you tried to pass to your class_weight="auto" classifier? Not all classifiers in sklearn support this, but some do. Check the docstrings.

Also you can rebalance your dataset by randomly dropping negative examples and / or over-sampling positive examples (+ potentially adding some slight gaussian feature noise).

187

answered Sep 25 '22 09:09

ogrisel

Related questions
                            
                                Using source subdirectories within R packages with roxygen2
                            
                                Is JavaFX complete replacement of Swing?
                            
                                Could not find metadata in local
                            
                                R multicore mcfork(): Unable to fork: Cannot allocate memory
                            
                                Can you use pending intents with localbroadcasts?
                            
                                Object allocation during draw/layout?
                            
                                Static local variable initialisation in multithreaded environment
                            
                                using static libraries instead of dynamic libraries in opencv
                            
                                Any way to solve a system of coupled differential equations in python?
                            
                                Upgraded to iOS 7 beta / Xcode 5 beta and receiving warning "CODE_SIGN_ENTITLEMENTS specified without a valid Developer Signing Identity for iOS"
                            
                                Multiple OUTPUT clauses in MERGE/INSERT/DELETE SQL commands?
                            
                                scala.concurrent.Future wrapper for java.util.concurrent.Future

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With