I am trying to parse date using to_date()
but I get the following exception.
SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '12/1/2010 8:26' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.
The exception suggests I should use a legacy Time Parser, for starter I don't know how to set it to Legacy.
Here is my implementation
dfWithDate = df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy"))
my date is in following format
+--------------+
| InvoiceDate|
+--------------+
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
We can use either to_timestamp, from_unixtime(unix_timestamp()) functions for this case. Try with "yyyy-MM-dd'T'hh:mm'Z'" enclosing T , Z in single quotes!
In PySpark use date_format() function to convert the DataFrame column from Date to String format. In this tutorial, we will show you a Spark SQL example of how to convert Date to String format using date_format() function on DataFrame. date_format() – function formats Date to String format.
The default format of the PySpark Date is yyyy-MM-dd . Returns the current date as a date column. Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument. Converts the column into `DateType` by casting rules to `DateType`.
Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.
Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column.
Apache Spark / Spark SQL Functions Spark SQL provides built-in standard Date and Timestamp (includes date and time) Functions defines in DataFrame API, these come in handy when we need to make operations on date and time. All these accept input as, Date type, Timestamp type or String.
Spark SQL Timestamp Functions Below are some of the Spark SQL Timestamp functions, these functions operate on both date and timestamp values. Select each link for a description and example of each function. The default format of the Spark Timestamp is yyyy-MM-dd HH:mm:ss.SSSS
spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy")).show()
+--------------+----------+
| InvoiceDate| date|
+--------------+----------+
|12/1/2010 8:26|2010-12-01|
+--------------+----------+
# in above code spark refers SparkSession
You can keep using the new implementation of spark 3 by parsing the string into timestamp first, than cast into a date :
from pyspark.sql import functions as F
dfWithDate = df.withColumn("date", F.to_date(F.to_timestamp(col("InvoiceDate"), "M/d/yyyy H:mm")))
dfWithDate.show()
#+--------------+----------+
#| InvoiceDate| date|
#+--------------+----------+
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#|12/1/2010 8:26|2010-12-01|
#+--------------+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With