I'm having trouble importing magellan-1.0.4-s_2.11
in spark notebook. I've downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and have tried placing SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11
in the Start of Customized Settings
section of the spark-notebook file of the bin folder.
Here are my imports
import magellan.{Point, Polygon, PolyLine}
import magellan.coord.NAD83
import org.apache.spark.sql.magellan.MagellanContext
import org.apache.spark.sql.magellan.dsl.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
And my errors...
<console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan
import magellan.{Point, Polygon, PolyLine}
^
<console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan
import magellan.coord.NAD83
^
<console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan
import org.apache.spark.sql.magellan.MagellanContext
I then tried to import the new library like any other library by placing it into the main script
like so:
$lib_dir/magellan-1.0.4-s_2.11.jar"
This didn't work and I'm left scratching my head wondering what I've done wrong. How do I import libraries such as magellan into spark notebook?
Try evaluating something like
:dp "harsha2010" % "magellan" % "1.0.4-s_2.11"
It will load the library into Spark, allowing it to be import
ed - assuming it can be obtained though the Maven repo. In my case it failed with a message:
failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786
I think file was to big and connection was interrupted before whole file could be downloaded.
Workaround
So I downloaded the JAR manually from:
http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/
and copied it into the:
/tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11
And then :dp
command worked. Try Calling it first, and if it will fail copy JAR into the right path to make things work.
Better solution
I should investigate why download failed to fix it in the first place... or put that library in my local M2 repo. But that should get you going.
I would suggest to check this:
https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#import-download-dependencies
and
https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#add-spark-packages
I think the :dp
magic command is depreciated, instead you should add your custom dependencies in the notebook metadata. You can go in the menu Edit > Edit notebook metadata, there add something like:
"customDeps": [
"harsha2010 % magellan % 1.0.4-s_2.11"
]
Once done, you will need to restart the kernel, you can check in the browser console if the package is being downloaded properly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With