The pyspark RDD documentation <blockquote> http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD </blockquote> does not show any method(s) to display partition information for an RDD. Is there any way to get that information without executing an additional step e.g.: <pre class="prettyprint"><code>myrdd.mapPartitions(lambda x: iter[1]).sum() </code></pre> The above does work .. but seems like extra effort.

I missed it: very simple: <pre class="prettyprint"><code>rdd.getNumPartitions() </code></pre> Not used to the java-ish getFooMethod() anymore ;) Update : Adding in the comment from @dnlbrky : <pre class="prettyprint"><code>dataFrame.rdd.getNumPartitions() </code></pre>

Show partitions on a pyspark RDD

Tags:

python

apache-spark

pyspark

The pyspark RDD documentation

http://spark.apache.org/docs/1.2.1/api/python/pyspark.html#pyspark.RDD

does not show any method(s) to display partition information for an RDD.

Is there any way to get that information without executing an additional step e.g.:

myrdd.mapPartitions(lambda x: iter[1]).sum()

The above does work .. but seems like extra effort.

407

asked Mar 15 '15 00:03

WestCoastProjects

1 Answers

I missed it: very simple:

rdd.getNumPartitions()

Not used to the java-ish getFooMethod() anymore ;)

Update : Adding in the comment from @dnlbrky :

dataFrame.rdd.getNumPartitions()

163

answered Oct 13 '22 16:10

WestCoastProjects

Related questions
                            
                                Wrapping around on a list when list index is out of range
                            
                                NameError: name 'random' is not defined [closed]
                            
                                How to disable perspective in mplot3d?
                            
                                Setting plot background colour in Seaborn
                            
                                SqlAlchemy converting UTC DateTime to local time before saving
                            
                                How do I get the giant component of a NetworkX graph?
                            
                                Storing logger messages in a string
                            
                                How can i remove extra "s" from django admin panel?
                            
                                from matplotlib.backends import _tkagg ImportError: cannot import name _tkagg
                            
                                Save multiple arrays to a csv file with column names
                            
                                Convert class 'pandas.indexes.numeric.Int64Index' to numpy
                            
                                Pandas read json ValueError: Protocol not known
                            
                                Clean Python Regular Expressions
                            
                                Code to create a password encrypted zip file? [duplicate]
                            
                                Multiple value checks using 'in' operator (Python)
                            
                                Better ways to print out column names when using cx_Oracle
                            
                                lxml.etree, element.text doesn't return the entire text from an element
                            
                                Compiling Python modules on Windows x64
                            
                                In Python how can one tell if a module comes from a C extension?
                            
                                Django - Delete file from amazon S3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Show partitions on a pyspark RDD

Tags:

python

apache-spark

pyspark

WestCoastProjects

People also ask

1 Answers

WestCoastProjects

Recent Activity

Donate For Us