Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert date to end of month in Spark

I have a Spark DataFrame as shown below:

#Create DataFrame    
df <- data.frame(name = c("Thomas", "William", "Bill", "John"),
      dates = c('2017-01-05', '2017-02-23', '2017-03-16', '2017-04-08'))
df <- createDataFrame(df)

#Make sure df$dates column is in 'date' format    
df <- withColumn(df, 'dates', cast(df$dates, 'date'))

name    | dates
--------------------
Thomas  |2017-01-05
William |2017-02-23
Bill    |2017-03-16
John    |2017-04-08

I want to change dates to the end of month date, so they would look like shown below. How do I do this? Either SparkR or PySpark code is fine.

name    | dates
--------------------
Thomas  |2017-01-31
William |2017-02-28
Bill    |2017-03-31
John    |2017-04-30
like image 518
Gaurav Bansal Avatar asked Jun 21 '17 21:06

Gaurav Bansal


People also ask

How do I get last day of month in spark Scala?

Spark has a last_day() and add_months() that you can use to get the last day of previous month. Use the add_months() with -1 to get the previous month, then use the last_day() to get the last day of the month. Here is the sample Scala code.

How do you get the start date in the month in Pyspark?

In order to get month, year and quarter from pyspark we will be using month(), year() and quarter() function respectively. year() Function with column name as argument extracts year from date in pyspark. month() Function with column name as argument extracts month from date in pyspark.

How do I change the date format in spark data frame?

(i) By applying to_date , we are changing the datatype of this column (string) to Date datatype. Also, we are informing to_date that the format in this string column is yyyy-MM-dd so read the column accordingly. (ii) Next, we are applying date_format to achieve the date format we require which is MM/dd/yyyy .


1 Answers

You may use the following (PySpark):

from pyspark.sql.functions import last_day

df.select('name', last_day(df.dates).alias('dates')).show()

To clarify, last_day(date) returns the last day of the month of which date belongs to.

I'm pretty sure there is a similar function in sparkR https://spark.apache.org/docs/1.6.2/api/R/last_day.html

like image 67
Leonard Aukea Avatar answered Sep 21 '22 17:09

Leonard Aukea