PySpark print to console

Tags:

When running a PySpark job on the dataproc server like this

gcloud --project <project_name> dataproc jobs submit pyspark --cluster <cluster_name> <python_script>

my print statements don't show up in my terminal.

Is there any way to output data onto the terminal in PySpark when running jobs on the cloud?

Edit: I would like to print/log info from within my transformation. For example:

def print_funct(l):
    print(l)
    return l

rddData.map(lambda l: print_funct(l)).collect()

Should print every line of data in the RDD rddData.

Doing some digging, I found this answer for logging, however, testing it provides me the results of this question, whose answer states that that logging isn't possible within the transformation

724

asked May 24 '16 07:05

Roman

1 Answers

Printing or logging inside of a transform will end up in the Spark executor logs, which can be accessed through your Application's AppMaster or HistoryServer via the YARN ResourceManager Web UI.

You could alternatively collect the information you are printing alongside your output (e.g. in a dict or tuple). You could also stash it away in an accumulator and then print it from the driver.

If you are doing a lot of print statement debugging, you might find it faster to SSH into your master node and use the pyspark REPL or IPython to experiment with your code. This would also allow you to use the --master local flag which would make your print statements appear in stdout.

116

answered Oct 16 '22 11:10

Patrick Clay

Related questions
                            
                                Python object cache
                            
                                I am trying to loop between two times, from 8:00 to 17:00 for every 15 mins
                            
                                Tastypie Nested Resources - cached_obj_get() takes exactly 2 arguments (1 given)
                            
                                Plot an IPython Notebook figure inline with fig.show()?
                            
                                Django Form setting ChoiceField options as Form is called
                            
                                how to create a file and throw exception if already exists
                            
                                UltiSnips requires py >= 2.6 or any py3
                            
                                How can I convert python class with slots to dictionary?
                            
                                Selenium Pyvirtualdisplay Hangs on starting
                            
                                How to obtain 3D colored surface via Python?
                            
                                Import error for Oauth
                            
                                pip, easy_install commands not working in Ubuntu. Python 2.7 and 3.4 are installed
                            
                                Invalid block tag: 'bootstrap_icon', expected 'endblock'
                            
                                Distinguish matches in pyparsing
                            
                                Python - store a string and an int using map(sys.stdin.readline())
                            
                                What happened to ifilter?
                            
                                Cannot "sudo pip uninstall" operation not permitted (/tmp) in OS X El Capitan
                            
                                How does a for loop evaluate its argument
                            
                                How to serialize Python dict to JSON
                            
                                Multi threading in Tkinter GUI, threads in different classes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PySpark print to console

Tags:

python-2.7

pyspark

google-cloud-dataproc

Roman

People also ask

1 Answers

Patrick Clay

Recent Activity

Donate For Us