Spark and profiling or execution plan

1 Answers

As Pushkr said, with dataframe and dataset we can use the .explain() method to display the derivation, partion and eventual shuffle.

With RDD we can use the toDebugString for kind of the same result. Also, there is dependencies to indicate if the new rdd derivate from the previous one with narrow or wide dependency.

173

answered Nov 25 '22 20:11

GPif

Related questions
                            
                                UPDATE Cassandra table using spark cassandra connector
                            
                                How to add two Sparse Vectors in Spark using Python
                            
                                Spark executor on yarn-client does not take executor core count configuration.
                            
                                Spark DataFrame filtering: retain element belonging to a list
                            
                                Checkpointing In ALS Spark Scala
                            
                                SparkSQL sql syntax for nth item in array
                            
                                How do I collect a List of Strings from spark DataFrame Column after a GroupBy operation?
                            
                                Spark remove duplicate rows from DataFrame [duplicate]
                            
                                Predict clusters from data using Spark MLlib KMeans
                            
                                RandomForestClassifier was given input with invalid label column error in Apache Spark
                            
                                What does container/resource allocation mean in Hadoop and in Spark when running on Yarn?
                            
                                Class org.apache.hadoop.fs.s3native.NativeS3FileSystem not found (Spark 1.6 Windows)
                            
                                save dataframe as external hive table
                            
                                How to implement LEAD and LAG in Spark-scala
                            
                                How to access elemens in Row RDD in SCALA
                            
                                Apache Spark - Backend servers
                            
                                spark Type mismatch: cannot convert from JavaRDD<Object> to JavaRDD<String>
                            
                                How does MapReduce recover from errors if failure happens in an intermediate stage
                            
                                Spark 2.0 ALS Recommendation how to recommend to a user
                            
                                Is it possible to filter Spark DataFrames to return all rows where a column value is in a list using pyspark?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark and profiling or execution plan

Tags:

apache-spark

pyspark

GPif

People also ask

1 Answers

GPif

Recent Activity

Donate For Us