Partitioning by multiple columns in Spark SQL

1 Answers

This won't work. The second partitionBy will overwrite the first one. Both partition columns have to be specified in the same call:

val w = Window.partitionBy($"a", $"b").rangeBetween(-100, 0)

120

answered Oct 17 '22 15:10

zero323

Related questions
                            
                                How to select a same-size stratified sample from a dataframe in Apache Spark?
                            
                                PySpark: Subtract Two Timestamp Columns and Give Back Difference in Minutes (Using F.datediff gives back only whole days)
                            
                                KafkaUtils class not found in Spark streaming
                            
                                Write RDD as textfile using Apache Spark
                            
                                How can I efficiently join a large rdd to a very large rdd in spark?
                            
                                Apache Spark Running Locally Giving Refused Connection Error
                            
                                Spark: persist and repartition order
                            
                                Getting specific field from chosen Row in Pyspark DataFrame
                            
                                Spark: how to get the number of written rows?
                            
                                Converting epoch to datetime in PySpark data frame using udf
                            
                                How to speed up spark df.write jdbc to postgres database?
                            
                                Spark dataframe reducebykey like operation
                            
                                Date difference between consecutive rows - Pyspark Dataframe
                            
                                Spark-Csv Write quotemode not working
                            
                                selecting a range of elements in an array spark sql
                            
                                Py4J error when creating a spark dataframe using pyspark
                            
                                Error:'java.lang.UnsupportedOperationException' for Pyspark pandas_udf documentation code
                            
                                reading a file in hdfs from pyspark
                            
                                How to convert an RDD[Row] back to DataFrame [duplicate]
                            
                                Write Spark dataframe as CSV with partitions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Partitioning by multiple columns in Spark SQL

Tags:

window-functions

apache-spark

apache-spark-sql

Eric Staner

People also ask

1 Answers

zero323

Recent Activity

Donate For Us