I am trying to fix an issue with running out of memory, and I want to know whether I need to change these settings in the default configurations file (<code>spark-defaults.conf</code>) in the spark home folder. Or, if I can set them in the code. I saw this question PySpark: java.lang.OutofMemoryError: Java heap space and it says that it depends on if I'm running in <code>client</code> mode. I'm running spark on a cluster and monitoring it using standalone. But, how do I figure out if I'm running spark in <code>client</code> mode?

If you are running an interactive shell, e.g. <code>pyspark</code> (CLI or via an IPython notebook), by default you are running in <code>client</code> mode. You can easily verify that you cannot run <code>pyspark</code> or any other interactive shell in <code>cluster</code> mode: <pre class="prettyprint"><code>$ pyspark --master yarn --deploy-mode cluster Python 2.7.11 (default, Mar 22 2016, 01:42:54) [GCC Intel(R) C++ gcc 4.8 mode] on linux2 Type "help", "copyright", "credits" or "license" for more information. Error: Cluster deploy mode is not applicable to Spark shells. $ spark-shell --master yarn --deploy-mode cluster Error: Cluster deploy mode is not applicable to Spark shells. </code></pre> Examining the contents of the <code>bin/pyspark</code> file may be instructive, too - here is the final line (which is the actual executable): <pre class="prettyprint"><code>$ pwd /home/ctsats/spark-1.6.1-bin-hadoop2.6 $ cat bin/pyspark [...] exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@" </code></pre> i.e. <code>pyspark</code> is actually a script run by <code>spark-submit</code> and given the name <code>PySparkShell</code>, by which you can find it in the Spark History Server UI; and since it is run like that, it goes by whatever arguments (or defaults) are included with its <code>spark-submit</code> command.

How to know deploy mode of PySpark application?

Tags:

apache-spark

pyspark

cluster-computing

I am trying to fix an issue with running out of memory, and I want to know whether I need to change these settings in the default configurations file (spark-defaults.conf) in the spark home folder. Or, if I can set them in the code.

I saw this question PySpark: java.lang.OutofMemoryError: Java heap space and it says that it depends on if I'm running in client mode. I'm running spark on a cluster and monitoring it using standalone.

But, how do I figure out if I'm running spark in client mode?

481

asked Jul 14 '16 21:07

makansij

1 Answers

If you are running an interactive shell, e.g. pyspark (CLI or via an IPython notebook), by default you are running in client mode. You can easily verify that you cannot run pyspark or any other interactive shell in cluster mode:

$ pyspark --master yarn --deploy-mode cluster
Python 2.7.11 (default, Mar 22 2016, 01:42:54)
[GCC Intel(R) C++ gcc 4.8 mode] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Error: Cluster deploy mode is not applicable to Spark shells.

$ spark-shell --master yarn --deploy-mode cluster
Error: Cluster deploy mode is not applicable to Spark shells.

Examining the contents of the bin/pyspark file may be instructive, too - here is the final line (which is the actual executable):

$ pwd
/home/ctsats/spark-1.6.1-bin-hadoop2.6
$ cat bin/pyspark
[...]
exec "${SPARK_HOME}"/bin/spark-submit pyspark-shell-main --name "PySparkShell" "$@"

i.e. pyspark is actually a script run by spark-submit and given the name PySparkShell, by which you can find it in the Spark History Server UI; and since it is run like that, it goes by whatever arguments (or defaults) are included with its spark-submit command.

answered Dec 24 '22 07:12

desertnaut

Related questions
                            
                                Why do I have to explicitly tell Spark what to cache?
                            
                                How to apply a function to a column of a Spark DataFrame?
                            
                                How do I convert column of unix epoch to Date in Apache spark DataFrame using Java?
                            
                                Query in Spark SQL inside an array
                            
                                Spark list all cached RDD names and unpersist
                            
                                Request insufficient authentication scopes when running Spark-Job on dataproc
                            
                                Unresolved reference while trying to import col from pyspark.sql.functions in python 3.5
                            
                                IllegalArgumentException thrown when count and collect function in spark
                            
                                could not read data from json using pyspark
                            
                                How to add days (as values of a column) to date?
                            
                                No module named graphframes Jupyter Notebook
                            
                                How to change number of executors in local mode?
                            
                                partitionBy & overwrite strategy in an Azure DataLake using PySpark in Databricks
                            
                                How can I pass a list of columns to select in pyspark dataframe?
                            
                                String to Date migration from Spark 2.0 to 3.0 gives Fail to recognize 'EEE MMM dd HH:mm:ss zzz yyyy' pattern in the DateTimeFormatter
                            
                                Apache Spark - Connection refused for worker
                            
                                Spark streaming elasticsearch dependencies
                            
                                How to read csv into sparkR ver 1.4?
                            
                                Outer join Spark dataframe with non-identical join column and then merge join column
                            
                                Window in Spark Streaming?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With