What is the difference between spark-submit and pyspark?

Tags:

If I start up pyspark and then run this command:

import my_script; spark = my_script.Sparker(sc); spark.collapse('./data/')

Everything is A-ok. If, however, I try to do the same thing through the commandline and spark-submit, I get an error:

Command: /usr/local/spark/bin/spark-submit my_script.py collapse ./data/
  File "/usr/local/spark/python/pyspark/rdd.py", line 352, in func
    return f(iterator)
  File "/usr/local/spark/python/pyspark/rdd.py", line 1576, in combineLocally
    merger.mergeValues(iterator)
  File "/usr/local/spark/python/pyspark/shuffle.py", line 245, in mergeValues
    for k, v in iterator:
  File "/.../my_script.py", line 173, in _json_args_to_arr
    js = cls._json(line)
RuntimeError: uninitialized staticmethod object

my_script:

...
if __name__ == "__main__":
    args = sys.argv[1:]
    if args[0] == 'collapse':
        directory = args[1]
        from pyspark import SparkContext
        sc = SparkContext(appName="Collapse")
        spark = Sparker(sc)
        spark.collapse(directory)
        sc.stop()

Why is this happening? What's the difference between running pyspark and running spark-submit that would cause this divergence? And how can I make this work in spark-submit?

EDIT: I tried running this from the bash shell by doing pyspark my_script.py collapse ./data/ and I got the same error. The only time when everything works is when I am in a python shell and import the script.

880

asked Nov 04 '14 02:11

user592419

2 Answers

If you built a spark application, you need to use spark-submit to run the application
- The code can be written either in python/scala
- The mode can be either local/cluster
If you just want to test/run few individual commands, you can use the shell provided by spark
- pyspark (for spark in python)
- spark-shell (for spark in scala)

180

answered Oct 01 '22 05:10

avrsanjay

pyspark command is REPL (read–eval–print loop) which is used to start an interactive shell to test few PySpark commands. This is used during development time. We are talking about Python here.

To run spark application written in Scala or Python on a cluster or locally, you can use spark-submit.

answered Oct 01 '22 05:10

Sharhabeel Hamdan

Related questions
                            
                                Roboto font in CSS
                            
                                Can't install Visual Studio 2014 CTP on Windows 8.1
                            
                                Repeat an iteration in loop if error occurs
                            
                                AES encryption on large files
                            
                                Scrapy: Extract links and text
                            
                                Will std::vectors inside another vector reallocate when the first vector reallocates?
                            
                                Jersey/Jackson: how to catch json mapping exception?
                            
                                Insertion-Order Dictionary (like Java's LinkedHashMap) in Swift?
                            
                                Prevent select on input text field
                            
                                How to pass Object using jsp:include param tag into another jsp
                            
                                TypeError: Value can't be converted to a dictionary
                            
                                How does lazy module loading work in ES6

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With