pyspark in Ipython notebook raises Py4JNetworkError

Tags:

I was using IPython notebook to run PySpark with just adding the following to the notebook:

import os
os.chdir('../data_files')
import sys
import pandas as pd
%pylab inline
from IPython.display import Image
os.environ['SPARK_HOME']="spark-1.3.1-bin-hadoop2.6"
sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'python') )
sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'bin') )
sys.path.append( os.path.join(os.environ['SPARK_HOME'], 'python/lib/py4j-0.8.2.1-src.zip') )
from pyspark import SparkContext
sc = SparkContext('local')

This worked fine for one project. but on my second project, after running a couple of lines (not the same every time), I get the following error:

ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.py", line 425, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
---------------------------------------------------------------------------
Py4JNetworkError                          Traceback (most recent call last)
<ipython-input-21-4626925bbe8f> in <module>()
----> 1 words.count()

/home/eee/Desktop/NLP/spark-1.3.1-bin-hadoop2.6/python/pyspark/rdd.pyc in count(self)
    930         3
    931         """
--> 932         return self.mapPartitions(lambda i: [sum(1 for _ in i)]).sum()
    933 
    934     def stats(self):

/home/eee/Desktop/NLP/spark-1.3.1-bin-hadoop2.6/python/pyspark/rdd.pyc in sum(self)
    921         6.0
    922         """
--> 923         return self.mapPartitions(lambda x: [sum(x)]).reduce(operator.add)
    924 
    925     def count(self):

/home/eee/Desktop/NLP/spark-1.3.1-bin-hadoop2.6/python/pyspark/rdd.pyc in reduce(self, f)
    737             yield reduce(f, iterator, initial)
    738 
--> 739         vals = self.mapPartitions(func).collect()
    740         if vals:
    741             return reduce(f, vals)

/home/eee/Desktop/NLP/spark-1.3.1-bin-hadoop2.6/python/pyspark/rdd.pyc in collect(self)
    710         Return a list that contains all of the elements in this RDD.
    711         """
--> 712         with SCCallSiteSync(self.context) as css:
    713             port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
    714         return list(_load_from_socket(port, self._jrdd_deserializer))

/home/eee/Desktop/NLP/spark-1.3.1-bin-hadoop2.6/python/pyspark/traceback_utils.pyc in __enter__(self)
     70     def __enter__(self):
     71         if SCCallSiteSync._spark_stack_depth == 0:
---> 72             self._context._jsc.setCallSite(self._call_site)
     73         SCCallSiteSync._spark_stack_depth += 1
     74 

/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.pyc in __call__(self, *args)
    534             END_COMMAND_PART
    535 
--> 536         answer = self.gateway_client.send_command(command)
    537         return_value = get_return_value(answer, self.gateway_client,
    538                 self.target_id, self.name)

/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.pyc in send_command(self, command, retry)
    360          the Py4J protocol.
    361         """
--> 362         connection = self._get_connection()
    363         try:
    364             response = connection.send_command(command)

/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.pyc in _get_connection(self)
    316             connection = self.deque.pop()
    317         except Exception:
--> 318             connection = self._create_connection()
    319         return connection
    320 

/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.pyc in _create_connection(self)
    323         connection = GatewayConnection(self.address, self.port,
    324                 self.auto_close, self.gateway_property)
--> 325         connection.start()
    326         return connection
    327 

/usr/local/lib/python2.7/dist-packages/py4j-0.8.2.1-py2.7.egg/py4j/java_gateway.pyc in start(self)
    430                 'server'
    431             logger.exception(msg)
--> 432             raise Py4JNetworkError(msg)
    433 
    434     def close(self):

Py4JNetworkError: An error occurred while trying to connect to the Java server

Once this happens, other lines working before now raise the same problem, any ideas?

650

asked May 20 '15 13:05

Ezer K

1 Answers

Specifications for:

pyspark 1.4.1
ipython 4.0.0
[OSX / homebrew]

If you want to launch pyspark within a Jupyter (ex-iPython) Notebook using the iPython kernel, I advise you to launch your notebook directly with the pyspark command:

>>>pyspark

But in order to do that, you need to add three lines in your bash .profile or zsh .zshrc profile to set these environment variables:

export SPARK_HOME=/path/to/apache-spark/1.4.1/libexec
export PYSPARK_DRIVER_PYTHON=ipython2 # remember that Apache-Spark only works with pyhton2.7
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

In my case, given that I'm on OSX , an installed apache-spark with Homebrew, this is:

export SPARK_HOME=/usr/local/Cellar/apache-spark/1.4.1/libexec
export PYSPARK_DRIVER_PYTHON=ipython2
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

Then, when you execute the command 'pyspark' in your terminal, your terminal will automatically open a Jupyter (ex-iPython) notebook in your default Browser.

>>>pyspark
I 17:51:00.209 NotebookApp] Serving notebooks from local directory: /Users/Thibault/code/kaggle
[I 17:51:00.209 NotebookApp] 0 active kernels
[I 17:51:00.210 NotebookApp] The IPython Notebook is running at: http://localhost:42424/
[I 17:51:00.210 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 17:51:11.980 NotebookApp] Kernel started: 53ad11b1-4fa4-459d-804c-0487036b0f29
15/09/02 17:51:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

answered Nov 27 '22 05:11

tmangin

Related questions
                            
                                How can I configure IPython 5 to have IPython 3's readline, tab-completion, (lack-of) syntax highlighting, etc? (self.IPython)
                            
                                How to DISABLE Jupyter notebook matplotlib plot inline?
                            
                                How do you update inline images in Ipython?
                            
                                How to remove a residual plot in Jupyter output after displaying a matplotlib animation?
                            
                                TypeError: required field "type_ignores" missing from Module
                            
                                Is there an ipython equivalent for erlang?
                            
                                Can one remotely access an IPython Notebook without using inline plotting?
                            
                                ipdb, multiple threads and autoreloading programs causing ProgrammingError
                            
                                Subsequent "list" commands in ipdb
                            
                                How do I get IPython autoreload magic to load automatcially when using an embedded shell?
                            
                                How to make new cells in ipython notebook markdown by default?
                            
                                How do I save .py files in IPython notebook with the .ipynb files
                            
                                autoupdate module in IPython / jupyter notebook
                            
                                See if previous command is still running in PyCharm IPython console?
                            
                                Checking the version of an ipython library from Terminal?
                            
                                Why does a semicolon return an empty string in IPython? [duplicate]
                            
                                How to test Jupyter notebooks on Travis CI?
                            
                                ipython debugger: full traceback on interactive pdb?
                            
                                python django shell (ipython) unexpected behavior or bug?
                            
                                Installing Torch7. iPython installation error (mac)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pyspark in Ipython notebook raises Py4JNetworkError

Tags:

ipython

ipython-notebook

pyspark

Ezer K

People also ask

1 Answers

tmangin

Recent Activity

Donate For Us