I'm using PySpark to develop a Machine Learning project. I have a lot of records with a field that stores a date taken from MongoDB. This date is a string but contains a date in an isodate format.
How can I convert it to one of the date formats allowed by Apache Spark? In case is possible, I would need to convert the whole column which contains this date field.
Here's an example of this field in a JSON format:
"date": "2020-11-09T07:27:57.078Z"
Just cast the column to a timestamp using df.select(F.col('date').cast('timestamp')). If you want date type, cast to date instead.
import pyspark.sql.functions as F
df = spark.createDataFrame([['2020-11-09T07:27:57.078Z']]).toDF('date')
df.show()
+------------------------+
|date |
+------------------------+
|2020-11-09T07:27:57.078Z|
+------------------------+
>>> df.printSchema()
root
|-- date: string (nullable = true)
# cast to timestamp
df2 = df.select(F.col('date').cast('timestamp'))
>>> df2.printSchema()
root
|-- date: timestamp (nullable = true)
df2.show()
+-----------------------+
|date |
+-----------------------+
|2020-11-09 07:27:57.078|
+-----------------------+
# cast to date
df3 = df.select(F.col('date').cast('date'))
>>> df3.printSchema()
root
|-- date: date (nullable = true)
df3.show()
+----------+
| date|
+----------+
|2020-11-09|
+----------+
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With