I am just getting started with Spark, so downloaded the for Hadoop 1 (HDP1, CDH3)
binaries from here and extracted it on a Ubuntu VM. Without installing Scala, I was able to execute the examples in the Quick Start guide from the Spark interactive shell.
As a side note, I observed that Spark has one of the best documentation around open source projects.
If you use scala language then ensure that scale is already installed before using Apache Spark. You can use Python also instead of Scala for programming in Spark but it must also be pre-installed like Scala.
The most common method to include this additional dependency is to use --packages argument for the spark-submit command. An example of --packages argument usage is shown in the “Execute” section below. The Apache Spark versions in the build file must match the Spark version in your Spark cluster.
You need to download the latest version of Scala. Here, you will see the scala-2.11. 6 version being used. After downloading, you will be able to find the Scala tar file in the Downloads folder.
Does Spark come included with Scala? If yes, where are the libraries/binaries?
The project configuration is placed in project/
folder. I my case here it is:
$ ls project/
build.properties plugins.sbt project SparkBuild.scala target
When you do sbt/sbt assembly
, it downloads appropriate version of Scala along with other project dependencies. Checkout the folder target/
for example:
$ ls target/
scala-2.9.2 streams
Note that Scala version is 2.9.2 for me.
For running Spark in other modes (distributed), do I need to install Scala on all the nodes?
Yes. You can create a single assembly jar as described in Spark documentation
If your code depends on other projects, you will need to ensure they are also present on the slave nodes. A popular approach is to create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark itself as a provided dependency; it need not be bundled since it is already present on the slaves. Once you have an assembled jar, add it to the SparkContext as shown here. It is also possible to submit your dependent jars one-by-one when creating a SparkContext.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With