I have this json file
{"created_at":"2022-01-02 12:17:43.399 UTC","updated_at":"2022-01-02 12:17:43.399 UTC"}
Trying to read it as
read_df = spark \
.read \
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS 'UTC'") \
.option("inferSchema", "true") \
.json(path)
but the inferred schema gives me back
root
|-- created_at: string (nullable = true)
|-- updated_at: string (nullable = true)
I've tried to forced it via withColumn("timestamp",to_timestamp(col("created_at"), "yyyy-MM-dd HH:mm:ss.SSS 'UTC'"))
and it works.
I don't want to provide myself the schema but let infer it because I have different files with different schemas and want to re-use the function for reading.
I'm not sure what's wrong.
Spark versionL 3.3.2
The inference of timestamps has to enabled explictly (docs, code):
Since version 3.0.1, the timestamp type inference is disabled by default. Set the JSON option inferTimestamp to true to enable such type inference.
read_df = spark \
.read \
.option("timestampFormat", "yyyy-MM-dd HH:mm:ss.SSS 'UTC'") \
.option("inferSchema", "true") \
.option("inferTimestamp", "true") \
.json(path)
returns two timestamp columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With