I'm running some operations in PySpark, and recently increased the number of nodes in my configuration (which is on Amazon EMR). However, even though I tripled the number of nodes (from 4 to 12), performance seems not to have changed. As such, I'd like to see if the new nodes are visible to Spark. I'm calling the following function: <pre class="prettyprint"><code>sc.defaultParallelism >>>> 2 </code></pre> But I think this is telling me the total number of tasks distributed to each node, not the total number of nodes that Spark can see. How do I go about seeing the amount of nodes that PySpark is using in my cluster?

On pyspark you could still call the scala <code>getExecutorMemoryStatus</code> API using pyspark's py4j bridge: <pre class="prettyprint"><code>sc._jsc.sc().getExecutorMemoryStatus().size() </code></pre>

getting number of visible nodes in PySpark

Tags:

I'm running some operations in PySpark, and recently increased the number of nodes in my configuration (which is on Amazon EMR). However, even though I tripled the number of nodes (from 4 to 12), performance seems not to have changed. As such, I'd like to see if the new nodes are visible to Spark.

I'm calling the following function:

sc.defaultParallelism >>>> 2

But I think this is telling me the total number of tasks distributed to each node, not the total number of nodes that Spark can see.

How do I go about seeing the amount of nodes that PySpark is using in my cluster?

580

asked Feb 27 '15 15:02

Bryan

1 Answers

On pyspark you could still call the scala getExecutorMemoryStatus API using pyspark's py4j bridge:

sc._jsc.sc().getExecutorMemoryStatus().size()

106

answered Sep 19 '22 14:09

Nic

Related questions
                            
                                UICollectionView Cell Spacing based on device screen size
                            
                                How Create a VPN connection shortcut on home screen? [closed]
                            
                                How to use swift flatMap to filter out optionals from an array
                            
                                Laravel 5 adding HTML to email
                            
                                How to escape the ? (question mark) operator to query Postgresql JSONB type in Rails
                            
                                Laravel : Saving a belongsToMany relationship
                            
                                ADB and Genymotion error: "adb server is out of date. killing... cannot bind 'tcp:5037' ADB server didn't ACK" [duplicate]
                            
                                Qt- Add custom font from resource
                            
                                How do I save multiple django models in a single transaction?
                            
                                T-SQL How to end an IF-ELSE IF-ELSE block
                            
                                Use div as radio-button
                            
                                MSbuild Error: The builds tools for v140 (Platform Toolset = 'v140') cannot be found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

getting number of visible nodes in PySpark

Tags:

Bryan

People also ask

1 Answers

Nic

Recent Activity

Donate For Us