I usually start my spark-shell with:
./bin/spark-shell --packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1
I'm trying to use Apache Toree now, any idea of how should I load these libraries on the notebook?
I tried the following:
jupyter toree install --user --spark_home=/home/eron/spark-1.6.1/ --spark_opts="--packages com.databricks:spark-csv_2.10:1.2.0,graphframes:graphframes:0.1.0-spark1.6,com.databricks:spark-avro_2.10:2.0.1"
but that did not seem to work
When you have Apache Toree correctly installed as a kernel for Jupyter, you can define Maven dependencies from within a notebook cell like this:
%AddDeps org.apache.spark spark-mllib_2.10 1.6.2
%AddDeps com.github.haifengl smile-core 1.1.0 --transitive
%AddDeps io.reactivex rxscala_2.10 0.26.1 --transitive
%AddDeps com.chuusai shapeless_2.10 2.3.0 --repository https://oss.sonatype.org/content/repositories/releases/
%AddDeps org.tmoerman plongeur-spark_2.10 0.3.9 --repository file:/Users/tmo/.m2/repository
(excerpt from this notebook)
%AddDeps
is a so-called magic, as documented in the Spark-kernel (now renamed Toree) wiki.
You can specify packages in the SPARK_OPTS
environment variable:
export SPARK_OPTS='--packages com.databricks:spark-csv_2.10:1.4.0'
Modifying spark-defaults.conf
also works:
echo spark.jars.packages=com.databricks:spark-csv_2.10:1.4.0 | sudo tee -a $SPARK_HOME/conf/spark-defaults.conf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With