Spark Multi Label classification

1 Answers

Also in Spark there is Logistic Regression that supports multilabel classification based on the api documentation. See also this.

The problem that you have on scikitlearn for the huge amount of training data will disappear with spark, using an appropriate Spark configuration.

Another approach is to use binary classifiers for each of the labels that your problem has, and get multilabel by running relevant-irrelevant predictions for that label. You can easily do that in Spark using any binary classifier.

Indirectly, what might also be of help, is to use multilabel categorization with nearest-neighbors, which is also state-of-the-art. Some nearest neighbors Spark extensions, like Spark KNN or Spark KNN graphs, for instance.

120

answered Oct 05 '22 04:10

marilena.oita

Related questions
                            
                                Build stateful chain for different events and assign global ID in spark
                            
                                Unable to connect Google Storage file using GSC connector from Spark
                            
                                Spark - Serializing an object with a non-serializable member
                            
                                org.apache.spark.SparkException: Job aborted due to stage failure: Task 98 in stage 11.0 failed 4 times
                            
                                BigQuery connector for pyspark via Hadoop Input Format example
                            
                                Spark: Find pairs having at least n common attributes?
                            
                                How to show the spark progress bar in Jupyter notebook (using pyspark)
                            
                                Spark 2.3 Memory Leak on Executor
                            
                                Is Apache Spark less accurate than Scikit Learn?
                            
                                .sparkstaging directory in hdfs is not deleted
                            
                                Big data signal analysis: better way to store and query signal data
                            
                                How to profile pyspark jobs
                            
                                PySpark: org.apache.spark.sql.AnalysisException: Attribute name ... contains invalid character(s) among " ,;{}()\n\t=". Please use alias to rename it [duplicate]
                            
                                sbt assembly shading to create fat jar to run on spark
                            
                                Spark + Parquet + Snappy: Overall compression ratio loses after spark shuffles data
                            
                                Bypassing org.apache.hadoop.mapred.InvalidInputException: Input Pattern s3n://[...] matches 0 files
                            
                                Why does spark-shell --master yarn-client fail (yet pyspark --master yarn seems to work)?
                            
                                In spark join, does table order matter like in pig?
                            
                                Spark query running very slow
                            
                                Spark Error: Could not initialize class org.apache.spark.rdd.RDDOperationScope

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark Multi Label classification

Tags:

apache-spark

scikit-learn

pyspark

Mohamed Karim Bouaziz

People also ask

1 Answers

marilena.oita

Recent Activity

Donate For Us