Is there any way to get max value from a column in Pyspark other than collect()?

Question

I want to get the maximum value from a date type column in a pyspark dataframe. Currently, I am using a command like this:

df.select('col1').distinct().orderBy('col1').collect()[0]['col1']

Here "col1" is the datetime type column. It works fine but I want to avoid the use of collect() here as i am doubtful that my driver may get overflowed.

Any advice would be helpful.

ernest_k · Accepted Answer

No need to sort, you can just select the maximum:

res = df.select(max(col('col1')).alias('max_col1')).first().max_col1

Or you can use selectExpr

res = df1.selectExpr('max(diff) as max_col1').first().max_col1

Is there any way to get max value from a column in Pyspark other than collect()?

Tags:

apache-spark

apache-spark-sql

pyspark

Samyak Jain

1 Answers

ernest_k

Recent Activity

Donate For Us

Is there any way to get max value from a column in Pyspark other than collect()?

Tags:

apache-spark

apache-spark-sql

pyspark

Samyak Jain

1 Answers

ernest_k

Related questions

Recent Activity

Donate For Us