Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

to_date fails to parse date in Spark 3.0

I am trying to parse date using to_date() but I get the following exception.

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '12/1/2010 8:26' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.

The exception suggests I should use a legacy Time Parser, for starter I don't know how to set it to Legacy.

Here is my implementation

dfWithDate = df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy"))

my date is in following format

+--------------+
|   InvoiceDate|
+--------------+
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
|12/1/2010 8:26|
like image 414
noobie-php Avatar asked Jul 16 '20 21:07

noobie-php


People also ask

How do you remove T and Z from timestamp in PySpark?

We can use either to_timestamp, from_unixtime(unix_timestamp()) functions for this case. Try with "yyyy-MM-dd'T'hh:mm'Z'" enclosing T , Z in single quotes!

How do I change the datetime format in PySpark DataFrame?

In PySpark use date_format() function to convert the DataFrame column from Date to String format. In this tutorial, we will show you a Spark SQL example of how to convert Date to String format using date_format() function on DataFrame. date_format() – function formats Date to String format.

How do I use datetime in PySpark?

The default format of the PySpark Date is yyyy-MM-dd . Returns the current date as a date column. Converts a date/timestamp/string to a value of string in the format specified by the date format given by the second argument. Converts the column into `DateType` by casting rules to `DateType`.

How to convert string to date in spark dataframe?

Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column. Below code, snippet takes the date in a string and converts it to date format on DataFrame.

Which date formats are supported by Spark date functions?

Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column.

How do I get date and time in spark?

Apache Spark / Spark SQL Functions Spark SQL provides built-in standard Date and Timestamp (includes date and time) Functions defines in DataFrame API, these come in handy when we need to make operations on date and time. All these accept input as, Date type, Timestamp type or String.

What is the default timestamp format in spark?

Spark SQL Timestamp Functions Below are some of the Spark SQL Timestamp functions, these functions operate on both date and timestamp values. Select each link for a description and example of each function. The default format of the Spark Timestamp is yyyy-MM-dd HH:mm:ss.SSSS


Video Answer


2 Answers

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df.withColumn("date", to_date(col("InvoiceDate"), "MM/dd/yyyy")).show()


+--------------+----------+
|   InvoiceDate|      date|
+--------------+----------+
|12/1/2010 8:26|2010-12-01|
+--------------+----------+

# in above code spark refers SparkSession
like image 78
suresiva Avatar answered Oct 24 '22 05:10

suresiva


You can keep using the new implementation of spark 3 by parsing the string into timestamp first, than cast into a date :

from pyspark.sql import functions as F

dfWithDate = df.withColumn("date", F.to_date(F.to_timestamp(col("InvoiceDate"), "M/d/yyyy H:mm")))

dfWithDate.show()
#+--------------+----------+
#|   InvoiceDate|      date|
#+--------------+----------+
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#| 2/1/2010 8:26|2010-02-01|
#|12/1/2010 8:26|2010-12-01|
#+--------------+----------+
like image 9
blackbishop Avatar answered Oct 24 '22 07:10

blackbishop