Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mllib dependency error

I'm trying to build a very simple scala standalone app using the Mllib, but I get the following error when trying to bulid the program:

Object Mllib is not a member of package org.apache.spark

Then, I realized that I have to add Mllib as dependency as follow :

version := "1"
scalaVersion :="2.10.4"

libraryDependencies ++= Seq(
"org.apache.spark"  %% "spark-core"              % "1.1.0",
"org.apache.spark"  %% "spark-mllib"             % "1.1.0"
)

But, here I got an error that says :

unresolved dependency spark-core_2.10.4;1.1.1 : not found

so I had to modify it to

"org.apache.spark" % "spark-core_2.10" % "1.1.1",

But there is still an error that says :

unresolved dependency spark-mllib;1.1.1 : not found

Anyone knows how to add dependency of Mllib in .sbt file?

like image 237
user3789843 Avatar asked Dec 12 '14 06:12

user3789843


People also ask

Is MLlib deprecated?

Is MLlib deprecated? No. MLlib includes both the RDD-based API and the DataFrame-based API. The RDD-based API is now in maintenance mode.

What is MLlib?

Built on top of Spark, MLlib is a scalable machine learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, and underlying optimization primitives.

Is MLlib part of spark?

Community. MLlib is developed as part of the Apache Spark project. It thus gets tested and updated with each Spark release.

What is the difference between spark ML and spark MLlib?

mllib is the first of the two Spark APIs while org.apache.spark.ml is the new API. spark. mllib carries the original API built on top of RDDs. spark.ml contains higher-level API built on top of DataFrames for constructing ML pipelines.


2 Answers

As @lmm pointed out, you can instead include the libraries as:

libraryDependencies ++= Seq( "org.apache.spark" % "spark-core_2.10" % "1.1.0", "org.apache.spark" % "spark-mllib_2.10" % "1.1.0" )

In sbt %% includes the scala version, and you are building with scala version 2.10.4 whereas the Spark artifacts are published against 2.10 in general.

It should be noted that if you are going to make an assembly jar to deploy your application you may wish to mark spark-core as provided e.g.

libraryDependencies ++= Seq( "org.apache.spark" % "spark-core_2.10" % "1.1.0" % "provided", "org.apache.spark" % "spark-mllib_2.10" % "1.1.0" )

Since the spark-core package will be in the path on executor anyways.

like image 57
Holden Avatar answered Nov 05 '22 23:11

Holden


Here is another way to add the dependency to your build.sbt file if you're using the Databricks sbt-spark-package plugin:

sparkComponents ++= Seq("sql","hive", "mllib")
like image 43
Powers Avatar answered Nov 05 '22 23:11

Powers