Column alias after groupBy in pyspark

Tags:

I need the resulting data frame in the line below, to have an alias name "maxDiff" for the max('diff') column after groupBy. However, the below line does not makeany change, nor throw an error.

Click to copy

 grpdf = joined_df.groupBy(temp1.datestamp).max('diff').alias("maxDiff")

519

asked Nov 04 '15 07:11

mhn

2 Answers

You can use agg instead of calling max method:

Click to copy

from pyspark.sql.functions import max  joined_df.groupBy(temp1.datestamp).agg(max("diff").alias("maxDiff"))

Similarly in Scala

Click to copy

import org.apache.spark.sql.functions.max  joined_df.groupBy($"datestamp").agg(max("diff").alias("maxDiff"))

Click to copy

joined_df.groupBy($"datestamp").agg(max("diff").as("maxDiff"))

answered Sep 27 '22 00:09

zero323

This is because you are aliasing the whole DataFrame object, not Column. Here's an example how to alias the Column only:

Click to copy

import pyspark.sql.functions as func  grpdf = joined_df \     .groupBy(temp1.datestamp) \     .max('diff') \     .select(func.col("max(diff)").alias("maxDiff"))

answered Sep 23 '22 00:09

Nhor

Related questions
                            
                                Replace the single quote (') character from a string
                            
                                Simple way to query connected USB devices info in Python?
                            
                                How do I print the content of a .txt file in Python?
                            
                                How to insert the contents of one list into another
                            
                                Free word list for use programmatically? [closed]
                            
                                How to printing numpy array with 3 decimal places? [duplicate]
                            
                                How does reduce_sum() work in tensorflow?
                            
                                How to get the cumulative distribution function with NumPy?
                            
                                Extract csv file specific columns to list in Python
                            
                                How do I sort a zipped list in Python?
                            
                                Python dictionary increment
                            
                                Configparser and string with %
                            
                                Limit number of threads in numpy
                            
                                Whats the simplest and safest method to generate a API KEY and SECRET in Python
                            
                                How to improve performance of this code?
                            
                                No module named tensorflow in jupyter
                            
                                String formatting in Python
                            
                                SSL module in Python is not available (on OSX)
                            
                                How do I use allow_tags in django 2.0 admin?
                            
                                Python JSON module has no attribute 'dumps'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Column alias after groupBy in pyspark

Tags:

python

scala

apache-spark

apache-spark-sql

pyspark

mhn

People also ask

2 Answers

zero323

Nhor

Recent Activity

Donate For Us