Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

No module named graphframes Jupyter Notebook

I'm following this installation guide but have the following problem with using graphframes

from pyspark import SparkContext
sc =SparkContext()
!pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
from graphframes import *

--------------------------------------------------------------------------- ImportError Traceback (most recent call last) in () ----> 1 from graphframes import *

ImportError: No module named graphframes

I'm not sure wether it is possible to install package on the following way. But I'll appreciate your advice and help.

like image 572
Daniel Chepenko Avatar asked May 11 '18 06:05

Daniel Chepenko


3 Answers

Good question!

Open up your bashrc file, and type export SPARK_OPTS="--packages graphframes:graphframes:0.5.0-spark2.1-s_2.11". Once you saved your bashrc file, close it and type source .bashrc.

Finally, open up your notebook and type:

from pyspark import SparkContext
sc = SparkContext()
sc.addPyFile('/home/username/spark-2.3.0-bin-hadoop2.7/jars/graphframes-0.5.0-spark2.1-s_2.11.jar')

After that, you may able to run it.

like image 192
Pranav A Avatar answered Nov 08 '22 09:11

Pranav A


I'm using jupyter notebook in docker, trying to get graphframes working. First, I used the method in https://stackoverflow.com/a/35762809/2202107, I have:

import findspark
findspark.init()
import pyspark
import os

SUBMIT_ARGS = "--packages graphframes:graphframes:0.7.0-spark2.4-s_2.11 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS

conf = pyspark.SparkConf()
sc = pyspark.SparkContext(conf=conf)
print(sc._conf.getAll())

Then by following this issue, we finally are able to import graphframes: https://github.com/graphframes/graphframes/issues/172

import sys
pyfiles = str(sc.getConf().get(u'spark.submit.pyFiles')).split(',')
sys.path.extend(pyfiles)
from graphframes import *
like image 44
Sida Zhou Avatar answered Nov 08 '22 09:11

Sida Zhou


The simplest way is to start jupyter with pyspark and graphframes is to start jupyter out from pyspark.

Just open your terminal and set the two environment variables and start pyspark with the graphframes package

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11

the advantage of this is also that if you later on want to run your code via spark-submit you can use the same start command

like image 2
Alex Ortner Avatar answered Nov 08 '22 09:11

Alex Ortner