I am using Spark 1.5.0 MLlib Random Forest algorithm (Scala code) to do two-class classification. As the dataset I am using is highly imbalanced, so the majority class is down sampled at 10% sampling rate. Is it possible to use the sampling weight (10 in this case) in the Spark Random Forest training? I don't see weight among the input parameters for <code>trainClassifier()</code> in Random Forest.

Not at all in Spark 1.5 and only partially (Logistic/LinearRegression) in Spark 1.6 https://issues.apache.org/jira/browse/SPARK-7685 Here's the umbrella JIRA tracking all the subtasks https://issues.apache.org/jira/browse/SPARK-9610

Can sample weight be used in Spark MLlib Random Forest training?

Tags:

scala

apache-spark

random-forest

apache-spark-mllib

I am using Spark 1.5.0 MLlib Random Forest algorithm (Scala code) to do two-class classification. As the dataset I am using is highly imbalanced, so the majority class is down sampled at 10% sampling rate.

Is it possible to use the sampling weight (10 in this case) in the Spark Random Forest training? I don't see weight among the input parameters for trainClassifier() in Random Forest.

432

asked Mar 11 '16 20:03

machine_learner

1 Answers

Not at all in Spark 1.5 and only partially (Logistic/LinearRegression) in Spark 1.6

https://issues.apache.org/jira/browse/SPARK-7685

Here's the umbrella JIRA tracking all the subtasks

https://issues.apache.org/jira/browse/SPARK-9610

answered Sep 25 '22 21:09

Edi Bice

Related questions
                            
                                Type of a function with Implicit parameters in Scala
                            
                                SBT Multi-Project Build with dynamic external projects?
                            
                                SBT: How to trigger separate actions when files change in two separate subprojects
                            
                                Concise syntax for function composition in Scala?
                            
                                Structural typing in Scala: use abstract type in refinement
                            
                                In what way is Scala's Option fold a catamorphism?
                            
                                Typesafe Play WS as dependency in SBT project
                            
                                Is it safe to send SIGTERM to JVM
                            
                                How does Phantom DSL for Cassandra actually connect?
                            
                                Proper usage of Futures in parallel calculations
                            
                                Scala traits mixin order and super call
                            
                                saveAsTextFile method in spark
                            
                                Alternative for println in Scala
                            
                                Connect to spark through a SOCKS proxy
                            
                                Implicit conversion of a function to a second-order-function only works if the function to convert has at least two parameters
                            
                                Type synonyms in java
                            
                                Scala Constructor/Method Parameter Checking
                            
                                Scala implicit conversion is applying under some conditions but not others
                            
                                java.util.concurrent.RejectedExecutionException in Spark although driver/client has precisely same version as Server
                            
                                Akka unit testing strategies without mocks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With