I want to create more than one SparkContext in a console. According to a post in mailing list, I need to do SparkConf.set( 'spark.driver.allowMultipleContexts' , true), it seems reasonable but can not work. Can anyone have experience in this? thanks a lot:
bellow is that I do and the error message, I made that in a Ipython Notebook:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("spark://10.21.208.21:7077").set("spark.driver.allowMultipleContexts", "true")
conf.getAll()
[(u'spark.eventLog.enabled', u'true'),
(u'spark.driver.allowMultipleContexts', u'true'),
(u'spark.driver.host', u'10.20.70.80'),
(u'spark.app.name', u'pyspark-shell'),
(u'spark.eventLog.dir', u'hdfs://10.21.208.21:8020/sparklog'),
(u'spark.master', u'spark://10.21.208.21:7077')]
sc1 = SparkContext(conf=conf.setAppName("app 1")) ## this sc success
sc1
<pyspark.context.SparkContext at 0x1b7cf10>
sc2 = SparkContext(conf=conf.setAppName("app 2")) ## this failed
ValueError Traceback (most recent call last)
<ipython-input-23-e6dcca5aec38> in <module>()
----> 1 sc2 = SparkContext(conf=conf.setAppName("app 2"))
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc)
100 """
101 self._callsite = first_spark_call() or CallSite(None, None, None)
--> 102 SparkContext._ensure_initialized(self, gateway=gateway)
103 try:
104 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
/usr/local/spark-1.2.0-bin-cdh4/python/pyspark/context.pyc in _ensure_initialized(cls, instance, gateway)
226 " created by %s at %s:%s "
227 % (currentAppName, currentMaster,
--> 228 callsite.function, callsite.file, callsite.linenum))
229 else:
230 SparkContext._active_spark_context = instance
ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=app 1, master=spark://10.21.208.21:7077) created by __init__ at <ipython-input-21-fb3adb569241>:1
This is a PySpark-specific limitation that existed before the spark.driver.allowMultipleContexts
configuration was added (which relates to multiple SparkContext objects within a JVM). PySpark disallows multiple active SparkContexts because various parts of its implementation assume that certain components have global shared state.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With