Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate Max(Date) and Min(Date) for DateType in pyspark dataframe?

Tags:

People also ask

How do you find the max date in PySpark?

Using the max () method, we can get the maximum value from the column, and finally, we can use the collect() method to get the maximum from the column. Where, df is the input PySpark DataFrame. column_name is the column to get the maximum value.

How do you get max and min in PySpark?

The PySpark SQL Aggregate functions are further grouped as the “agg_funcs” in the Pyspark. The Kurtosis() function returns the kurtosis of the values present in the group. The min() function returns the minimum value currently in the column. The max() function returns the maximum value present in the queue.


The dataframe has a date column in string type '2017-01-01'

It is converted to DateType()

df = df.withColumn('date', col('date_string').cast(DateType()))

I would like to calculate the first day and last day of the column. I tried with the following codes, but they do not work. Can anyone give any suggestions? Thanks!

df.select('date').min()
df.select('date').max()

df.select('date').last_day()
df.select('date').first_day()