Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark fix/remove console progress bar

As can be seen below, the Spark console output progress bar is messing up the outputs. Is there a configuration or flag that can be used to turn off the stage progress bar? Or better, how do I fix the console log so that the progress-bar disappears after the stages are finished? This may just be a bug for PySpark, but I'm not sure.

(CID, (v1 / n1, v2 / n2))
[Stage 46:============================================>           (19 + 4) / 24]('1', (0.020000000000000035, 4.805))
('5', (6.301249999999998, 0.125))
('10', (21.78000000000001, 3.125))
('7', (0.005000000000000009, 0.6049999999999996))

(CID, sqrt(v1 / n1 + v2 / n2))
('1', 2.19658826364888)
('5', 2.5350049309616733)
('10', 4.990490957811667)
('7', 0.7810249675906652)

(CID, (AD_MEAN, NCI_MEAN))
('7', (1.0, 5.5))
('5', (7.75, 5.3))
('10', (13.5, 6.0))
('1', (3.0, 5.0))

(CID, (AD_MEAN - NCI_MEAN))
('7', -4.5)
('5', 2.45)
('1', -2.0)
('10', 7.5)

(CID, (NUMER, DENOM))
[Stage 100:===================================================>   (30 + 2) / 32]('10', (7.5, 4.990490957811667))
('5', (2.45, 2.5350049309616733))
('7', (-4.5, 0.7810249675906652))
('1', (-2.0, 2.19658826364888))

It gets even worse sometimes (scroll to the right):

$ spark-submit main.py 
17/04/28 11:36:23 WARN Utils: Your hostname, Pandora resolves to a loopback address: 127.0.1.1; using 146.95.36.193 instead (on interface wlp3s0)
17/04/28 11:36:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/04/28 11:36:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[Stage 0:>                                                          (0 + 2                                                                          [Stage 32:=============================>                            (4 + 4[Stage 37:>                                                         (0 + 0[Stage 35:=====>           (4 + 2) / 12][Stage 37:>                 (0 + 0[Stage 35:===========>     (8 + 4) / 12][Stage 37:>                 (0 + 0[Stage 37:=======>                                                  (1 + 3[Stage 37:=============================>                            (4 + 0[Stage 36:========>       (13 + 4) / 24][Stage 37:=========>        (4 + 0[Stage 36:==============> (21 + 3) / 24][Stage 37:=========>        (4 + 1[Stage 37:====================================>                     (5 + 3[Stage 38:===================================>                    (20 + 4)[Stage 38:====================================================>   (30 + 2)                                                                          SORTED (t-value, CID)
[(-5.761659596980321, '7'), (-0.9105029072119708, '1'), (0.9664675480810896, '5'), (1.5028581483070664, '10')]
like image 633
Dobob Avatar asked Apr 24 '17 21:04

Dobob


3 Answers

You could either disable by setting

  • spark.ui.showConsoleProgress = False

or

  • decrease logging level in log4j.properties higher than INFO, i.e. to ERROR

Relevant Spark jiras:

  • https://issues.apache.org/jira/browse/SPARK-4017
  • https://issues.apache.org/jira/browse/SPARK-18719

spark.ui.showConsoleProgress was always in Spark, since version 1.2, but will be documented only in Spark 2.2.

Example code:

spark.conf.set('spark.ui.showConsoleProgress', False)
like image 140
Tagar Avatar answered Nov 20 '22 11:11

Tagar


The answer of Tagar didn't work for me in pyspark.

Here is the workaround I found to remove progress bars from the console:

from pyspark import SparkContext, SparkConf
from pyspark.sql.session import SparkSession


conf = SparkConf().set("spark.ui.showConsoleProgress", "false")
sc = SparkContext(appName="RandomForest", conf=conf)
spark = SparkSession(sc)

Hope this helps!

like image 3
Clement T. Avatar answered Nov 20 '22 11:11

Clement T.


Here is how you would do it using the SparkSession builder in pyspark 2.4.x

spark = SparkSession
    .builder.master('local') \
    .appName('MySparkApplication') \
    .config('spark.ui.showConsoleProgress', 'false') \ # <=====
    .getOrCreate()
like image 2
Onema Avatar answered Nov 20 '22 13:11

Onema