Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

Tags:

I'm writing a spark application and would like to use algorithms in MLlib. In the API doc I found two different classes for the same algorithm.

For example, there is one LogisticRegression in org.apache.spark.ml.classification also a LogisticRegressionwithSGD in org.apache.spark.mllib.classification.

The only difference I can find is that the one in org.apache.spark.ml is inherited from Estimator and was able to be used in cross validation. I was quite confused that they are placed in different packages. Is there anyone know the reason for it? Thanks!

953

asked May 14 '15 07:05

ailzhang

2 Answers

It's JIRA ticket

And From Design Doc:

MLlib now covers a basic selection of machine learning algorithms, e.g., logistic regression, decision trees, alternating least squares, and k-means. The current set of APIs contains several design flaws that prevent us moving forward to address practical machine learning pipelines, make MLlib itself a scalable project.

The new set of APIs will live under org.apache.spark.ml, and o.a.s.mllib will be deprecated once we migrate all features to o.a.s.ml.

136

answered Oct 11 '22 11:10

yjshen

The spark mllib guide says:

spark.mllib contains the original API built on top of RDDs.

spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

and

Using spark.ml is recommended because with DataFrames the API is more versatile and flexible. But we will keep supporting spark.mllib along with the development of spark.ml. Users should be comfortable using spark.mllib features and expect more features coming. Developers should contribute new algorithms to spark.ml if they fit the ML pipeline concept well, e.g., feature extractors and transformers.

I think the doc explains it very well.

answered Oct 11 '22 11:10

JasonWayne

Related questions
                            
                                any UML tools for Scala
                            
                                settings.maxPrintString for Scala 2.9 REPL
                            
                                How can I convert a json string to a scala map?
                            
                                Scalaz: request for use case for Cokleisli composition
                            
                                Scala Vector fold syntax (/: and :\ and /:\)
                            
                                How to Prevent CSRF in Play [2.0] Using Scala?
                            
                                Scala: map a Map to list of tuples
                            
                                Best way to handle false unused imports in intellij
                            
                                How do I get hold of exceptions thrown in a Scala Future?
                            
                                How to suppress info and success messages in sbt?
                            
                                How can I use primitives in Scala?
                            
                                Using generic case classes in Scala
                            
                                What effect does using Action.async have, since Play uses Netty which is non-blocking
                            
                                Scala Passing Function with Argument
                            
                                How to clone an iterator?
                            
                                scala: memoize a function no matter how many arguments the function takes?
                            
                                Coding with Scala implicits in style
                            
                                Maximum Length for scala queue
                            
                                Efficient string concatenation in Scala
                            
                                How to use constant value in UDF of Spark SQL(DataFrame)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between org.apache.spark.ml.classification and org.apache.spark.mllib.classification

Tags:

scala

apache-spark

apache-spark-mllib

ailzhang

People also ask

2 Answers

yjshen

JasonWayne

Recent Activity

Donate For Us