I have a column in spark dataframe of String datatype (with date in yyyy-MM-dd pattern) I want to display the column value in MM/dd/yyyy pattern
My data is
val df = sc.parallelize(Array(
  ("steak", "1990-01-01", "2000-01-01", 150),
  ("steak", "2000-01-02", "2001-01-13", 180),
  ("fish",  "1990-01-01", "2001-01-01", 100)
)).toDF("name", "startDate", "endDate", "price")
df.show()
+-----+----------+----------+-----+
| name| startDate|   endDate|price|
+-----+----------+----------+-----+
|steak|1990-01-01|2000-01-01|  150|
|steak|2000-01-02|2001-01-13|  180|
| fish|1990-01-01|2001-01-01|  100|
+-----+----------+----------+-----+
root
 |-- name: string (nullable = true)
 |-- startDate: string (nullable = true)
 |-- endDate: string (nullable = true)
 |-- price: integer (nullable = false)
I want to show endDate in MM/dd/yyyy pattern. All I am able to do is convert the column to DateType from String
val df2 = df.select($"endDate".cast(DateType).alias("endDate"))
df2.show()
+----------+
|   endDate|
+----------+
|2000-01-01|
|2001-01-13|
|2001-01-01|
+----------+
df2.printSchema()
root
 |-- endDate: date (nullable = true)
I want to show endDate in MM/dd/yyyy pattern. Only reference I found is this which doesn't solve the problem
Spark provides current_date() function to get the current system date in DateType 'yyyy-MM-dd' format and current_timestamp() to get current timestamp in `yyyy-MM-dd HH:mm:ss. SSSS` format.
In PySpark use date_format() function to convert the DataFrame column from Date to String format.
Spark to_date() – Convert timestamp to date Spark Timestamp consists of value in the format “yyyy-MM-dd HH:mm:ss. SSSS” and date format would be ” yyyy-MM-dd”, Use to_date() function to truncate time from Timestamp or to convert the timestamp to date on Spark DataFrame column.
We can use either to_timestamp, from_unixtime(unix_timestamp()) functions for this case. Try with "yyyy-MM-dd'T'hh:mm'Z'" enclosing T , Z in single quotes!
You can use date_format function.
  import sqlContext.implicits._
  import org.apache.spark.sql.functions._
  val df = sc.parallelize(Array(
    ("steak", "1990-01-01", "2000-01-01", 150),
    ("steak", "2000-01-02", "2001-01-13", 180),
    ("fish", "1990-01-01", "2001-01-01", 100))).toDF("name", "startDate", "endDate", "price")
  df.show()
  df.select(date_format(col("endDate"), "MM/dd/yyyy")).show
Output :
+-------------------------------+
|date_format(endDate,MM/dd/yyyy)|
+-------------------------------+
|                     01/01/2000|
|                     01/13/2001|
|                     01/01/2001|
+-------------------------------+
                        Use pyspark.sql.functions.date_format(date, format):
val df2 = df.select(date_format("endDate", "MM/dd/yyyy").alias("endDate"))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With