I am trying to convert a column which is in String format to Date format using the to_date
function but its returning Null values.
df.createOrReplaceTempView("incidents") spark.sql("select Date from incidents").show() +----------+ | Date| +----------+ |08/26/2016| |08/26/2016| |08/26/2016| |06/14/2016| spark.sql("select to_date(Date) from incidents").show() +---------------------------+ |to_date(CAST(Date AS DATE))| +---------------------------+ | null| | null| | null| | null|
The Date column is in String format:
|-- Date: string (nullable = true)
Using strptime() , date and time in string format can be converted to datetime type. The first parameter is the string and the second is the date time format specifier. One advantage of converting to date format is one can select the month or date or time individually.
PySpark to_date() – Convert String to Date Format to_date() – function is used to format string ( StringType ) to date ( DateType ) column. This function takes the first argument as a date string and the second argument takes the pattern the date is in the first argument.
In order to be able to work with it, we are required to convert the dates into the datetime format. Code #1 : Convert Pandas dataframe column type from string to datetime format using pd.to_datetime() function. Output : As we can see in the output, the data type of the ‘Date’ column is object i.e. string.
As we can see in the output, the data type of the ‘Date’ column is object i.e. string. Now we will convert it to datetime format using DataFrame.astype () function. As we can see in the output, the format of the ‘Date’ column has been changed to the datetime format.
In SQL Server, converting string to date implicitly depends on the string date format and the default language settings (regional settings); If the date stored within a string is in ISO formats: yyyyMMdd or yyyy-MM-ddTHH:mm:ss (.mmm), it can be converted regardless of the regional settings, else the date must have a supported format ...
You may refer to the following source for the different formats that you may apply. For our example, the complete Python code to convert the strings to datetime would be: import pandas as pd values = {'dates': ['20190902','20190913','20190921'], 'status': ['Opened','Opened','Closed'] } df = pd.DataFrame (values, ...
Use to_date
with Java SimpleDateFormat
.
TO_DATE(CAST(UNIX_TIMESTAMP(date, 'MM/dd/yyyy') AS TIMESTAMP))
Example:
spark.sql(""" SELECT TO_DATE(CAST(UNIX_TIMESTAMP('08/26/2016', 'MM/dd/yyyy') AS TIMESTAMP)) AS newdate""" ).show() +----------+ | dt| +----------+ |2016-08-26| +----------+
I solved the same problem without the temp table/view and with dataframe functions.
Of course I found that only one format works with this solution and that's yyyy-MM-DD
.
For example:
val df = sc.parallelize(Seq("2016-08-26")).toDF("Id") val df2 = df.withColumn("Timestamp", (col("Id").cast("timestamp"))) val df3 = df2.withColumn("Date", (col("Id").cast("date"))) df3.printSchema root |-- Id: string (nullable = true) |-- Timestamp: timestamp (nullable = true) |-- Date: date (nullable = true) df3.show +----------+--------------------+----------+ | Id| Timestamp| Date| +----------+--------------------+----------+ |2016-08-26|2016-08-26 00:00:...|2016-08-26| +----------+--------------------+----------+
The timestamp of course has 00:00:00.0
as a time value.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With