Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to import libraries in Spark Notebook

I'm having trouble importing magellan-1.0.4-s_2.11 in spark notebook. I've downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and have tried placing SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11 in the Start of Customized Settings section of the spark-notebook file of the bin folder.

Here are my imports

import magellan.{Point, Polygon, PolyLine}
import magellan.coord.NAD83
import org.apache.spark.sql.magellan.MagellanContext
import org.apache.spark.sql.magellan.dsl.expressions._
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._

And my errors...

<console>:71: error: object Point is not a member of package org.apache.spark.sql.magellan
       import magellan.{Point, Polygon, PolyLine}
              ^
<console>:72: error: object coord is not a member of package org.apache.spark.sql.magellan
       import magellan.coord.NAD83
                       ^
<console>:73: error: object MagellanContext is not a member of package org.apache.spark.sql.magellan
       import org.apache.spark.sql.magellan.MagellanContext

I then tried to import the new library like any other library by placing it into the main script like so:

$lib_dir/magellan-1.0.4-s_2.11.jar"

This didn't work and I'm left scratching my head wondering what I've done wrong. How do I import libraries such as magellan into spark notebook?

like image 549
Curtis Chong Avatar asked Mar 09 '17 03:03

Curtis Chong


2 Answers

Try evaluating something like

:dp "harsha2010" % "magellan" % "1.0.4-s_2.11"

It will load the library into Spark, allowing it to be imported - assuming it can be obtained though the Maven repo. In my case it failed with a message:

failed to load 'harsha2010:magellan:jar:1.0.4-s_2.11 (runtime)' from ["Maven2 local (file:/home/dev/.m2/repository/, releases+snapshots) without authentication", "maven-central (http://repo1.maven.org/maven2/, releases+snapshots) without authentication", "spark-packages (http://dl.bintray.com/spark-packages/maven/, releases+snapshots) without authentication", "oss-sonatype (https://oss.sonatype.org/content/repositories/releases/, releases+snapshots) without authentication"] into /tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786

I think file was to big and connection was interrupted before whole file could be downloaded.

Workaround

So I downloaded the JAR manually from:

http://dl.bintray.com/spark-packages/maven/harsha2010/magellan/1.0.4-s_2.11/

and copied it into the:

/tmp/spark-notebook/aether/b2c7d8c5-1f56-4460-ad39-24c4e93a9786/harsha2010/magellan/1.0.4-s_2.11

And then :dp command worked. Try Calling it first, and if it will fail copy JAR into the right path to make things work.

Better solution

I should investigate why download failed to fix it in the first place... or put that library in my local M2 repo. But that should get you going.

like image 98
Mateusz Kubuszok Avatar answered Oct 11 '22 04:10

Mateusz Kubuszok


I would suggest to check this:

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#import-download-dependencies

and

https://github.com/spark-notebook/spark-notebook/blob/master/docs/metadata.md#add-spark-packages

I think the :dp magic command is depreciated, instead you should add your custom dependencies in the notebook metadata. You can go in the menu Edit > Edit notebook metadata, there add something like:

"customDeps": [
   "harsha2010 % magellan % 1.0.4-s_2.11"
]

Once done, you will need to restart the kernel, you can check in the browser console if the package is being downloaded properly.

like image 35
0asa Avatar answered Oct 11 '22 04:10

0asa