Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

install spark packages in toree

I usually start my spark-shell with:

./bin/spark-shell --packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1

I'm trying to use Apache Toree now, any idea of how should I load these libraries on the notebook?

I tried the following:

jupyter toree install --user --spark_home=/home/eron/spark-1.6.1/ --spark_opts="--packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1"

but that did not seem to work

like image 932
elelias Avatar asked Dec 11 '22 16:12

elelias


2 Answers

When you have Apache Toree correctly installed as a kernel for Jupyter, you can define Maven dependencies from within a notebook cell like this:

%AddDeps org.apache.spark spark-mllib_2.10 1.6.2
%AddDeps com.github.haifengl smile-core 1.1.0 --transitive
%AddDeps io.reactivex rxscala_2.10 0.26.1 --transitive
%AddDeps com.chuusai shapeless_2.10 2.3.0 --repository https://oss.sonatype.org/content/repositories/releases/
%AddDeps org.tmoerman plongeur-spark_2.10 0.3.9 --repository file:/Users/tmo/.m2/repository

(excerpt from this notebook)

%AddDeps is a so-called magic, as documented in the Spark-kernel (now renamed Toree) wiki.

like image 66
Thomas Moerman Avatar answered Feb 06 '23 10:02

Thomas Moerman


You can specify packages in the SPARK_OPTS environment variable:

export SPARK_OPTS='--packages com.databricks:spark-csv_2.10:1.4.0'

Modifying spark-defaults.conf also works:

echo spark.jars.packages=com.databricks:spark-csv_2.10:1.4.0 | sudo tee -a $SPARK_HOME/conf/spark-defaults.conf
like image 36
Emre Avatar answered Feb 06 '23 08:02

Emre