Spark toDebugString not nice in python

Tags:

This is what I get when I use toDebugString in scala:

scala> val a  = sc.parallelize(Array(1,2,3)).distinct
a: org.apache.spark.rdd.RDD[Int] = MappedRDD[3] at distinct at <console>:12

scala> a.toDebugString
res0: String = 
(4) MappedRDD[3] at distinct at <console>:12
 |  ShuffledRDD[2] at distinct at <console>:12
 +-(4) MappedRDD[1] at distinct at <console>:12
    |  ParallelCollectionRDD[0] at parallelize at <console>:12

This is the equivalent in python:

>>> a = sc.parallelize([1,2,3]).distinct()
>>> a.toDebugString()
'(4) PythonRDD[6] at RDD at PythonRDD.scala:43\n |  MappedRDD[5] at values at NativeMethodAccessorImpl.java:-2\n |  ShuffledRDD[4] at partitionBy at NativeMethodAccessorImpl.java:-2\n +-(4) PairwiseRDD[3] at RDD at PythonRDD.scala:261\n    |  PythonRDD[2] at RDD at PythonRDD.scala:43\n    |  ParallelCollectionRDD[0] at parallelize at PythonRDD.scala:315'

As you can see, the output is not as nice in python as in scala. Is there any trick to have a nicer output of this function?

I am using Spark 1.1.0.

481

asked Oct 13 '14 14:10

poiuytrez

1 Answers

Try adding a print statement so that the debug string is actually printed, rather than displaying its __repr__:

>>> a = sc.parallelize([1,2,3]).distinct()
>>> print a.toDebugString()
(8) PythonRDD[27] at RDD at PythonRDD.scala:44 [Serialized 1x Replicated]
 |  MappedRDD[26] at values at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated]
 |  ShuffledRDD[25] at partitionBy at NativeMethodAccessorImpl.java:-2 [Serialized 1x Replicated]
 +-(8) PairwiseRDD[24] at distinct at <stdin>:1 [Serialized 1x Replicated]
    |  PythonRDD[23] at distinct at <stdin>:1 [Serialized 1x Replicated]
    |  ParallelCollectionRDD[21] at parallelize at PythonRDD.scala:358 [Serialized 1x Replicated]

191

answered Sep 28 '22 19:09

Josh Rosen

Related questions
                            
                                Pass **kwargs if not none
                            
                                Two Y-scales in pyqtgraph (twinx-like)
                            
                                Wtforms, Multi selection file upload
                            
                                Inverse function of numpy.polyval()
                            
                                Calculate time difference using Python [duplicate]
                            
                                How can I clear the Python pdb screen?
                            
                                SSLError (Read operation timed out) in Python requests
                            
                                No module named machinery
                            
                                How to split only on carriage returns with readlines in python?
                            
                                Import error for Oauth
                            
                                How do I configure sqlalchemy to correctly store emoji?
                            
                                Django makemigrations works, migrate fails with "django.db.utils.IntegrityError: NOT NULL constraint failed"
                            
                                having cv2.imread reading images from file objects or memory-stream-like data (here non-extracted tar)
                            
                                How to get shapefile geometry type in PyQGIS?
                            
                                How to paste from clipboard on Heroku iPython?
                            
                                .format() returns ValueError when using {0:g} to remove trailing zeros
                            
                                Use BeautifulSoup to get a value after a specific tag
                            
                                Numpy or SciPy Derivative function for non-uniform spacing?
                            
                                Python boto ec2 - How do I wait till an image is created or failed
                            
                                How does everything is an object even work?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark toDebugString not nice in python

Tags:

python

scala

apache-spark

poiuytrez

People also ask

1 Answers

Josh Rosen

Recent Activity

Donate For Us