Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

build.sbt: how to add spark dependencies

Hello I am trying to download spark-core, spark-streaming, twitter4j, and spark-streaming-twitter in the build.sbt file below:

name := "hello"  version := "1.0"  scalaVersion := "2.11.8"  libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1" libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.4.1"  libraryDependencies ++= Seq(   "org.twitter4j" % "twitter4j-core" % "3.0.3",   "org.twitter4j" % "twitter4j-stream" % "3.0.3" )  libraryDependencies += "org.apache.spark" % "spark-streaming-twitter_2.10" % "0.9.0-incubating" 

I simply took this libraryDependencies online so I am not sure which versions, etc. to use.

Can someone please explain to me how I should fix this .sbt files. I spent a couple hours trying to figure it out but none of the suggesstion worked. I installed scala through homebrew and I am on version 2.11.8

All of my errors were about:

Modules were resolved with conflicting cross-version suffixes. 
like image 720
Bobby Avatar asked Jun 22 '16 03:06

Bobby


People also ask

How do I add a dependency to spark?

The most common method to include this additional dependency is to use --packages argument for the spark-submit command. An example of --packages argument usage is shown in the “Execute” section below. The Apache Spark versions in the build file must match the Spark version in your Spark cluster.

Which is the correct way to add dependencies in sbt file?

Library dependencies can be added in two ways: unmanaged dependencies are jars dropped into the lib directory. managed dependencies are configured in the build definition and downloaded automatically from repositories.

How do we specify library dependencies in sbt?

You can use both managed and unmanaged dependencies in your SBT projects. If you have JAR files (unmanaged dependencies) that you want to use in your project, simply copy them to the lib folder in the root directory of your SBT project, and SBT will find them automatically.


1 Answers

The problem is that you are mixing Scala 2.11 and 2.10 artifacts. You have:

scalaVersion := "2.11.8" 

And then:

libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.4.1" 

Where the 2.10 artifact is being required. You are also mixing Spark versions instead of using a consistent version:

// spark 1.6.1 libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"  // spark 1.4.1 libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.4.1"  // spark 0.9.0-incubating libraryDependencies += "org.apache.spark" % "spark-streaming-twitter_2.10" % "0.9.0-incubating" 

Here is a build.sbt that fixes both problems:

name := "hello"  version := "1.0"  scalaVersion := "2.11.8"  val sparkVersion = "1.6.1"  libraryDependencies ++= Seq(   "org.apache.spark" %% "spark-core" % sparkVersion,   "org.apache.spark" %% "spark-streaming" % sparkVersion,   "org.apache.spark" %% "spark-streaming-twitter" % sparkVersion ) 

You also don't need to manually add twitter4j dependencies since they are added transitively by spark-streaming-twitter.

like image 77
marcospereira Avatar answered Sep 20 '22 10:09

marcospereira