Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Defining DateType conversion for DataFrame schema in Spark

I am reading DataFrame from CSV file, where first column is an event date and time e.g.

2016-08-08 07:45:28+03

In code below, is it possible to specify within schema definition how to convert such strings into date?

val df:DataFrame = spark.read.options(Map(
  "header" -> "true"
)).schema(StructType(
    StructField("EventTime", DataTypes.DateType, false) ::
    Nil
)).csv("C:/qos1h.csv")

This code fails with

java.lang.NumberFormatException: For input string: "28+03"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:580)
    at java.lang.Integer.parseInt(Integer.java:615)
    at java.sql.Timestamp.valueOf(Timestamp.java:259)
    at org.apache.spark.sql.catalyst.util.DateTimeUtils$.stringToTime(DateTimeUtils.scala:135)
    at org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:291)
    at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:115)
    at org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:84)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:125)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$1.apply(CSVFileFormat.scala:124)
like image 270
Mikhail Tsaplin Avatar asked Oct 19 '22 04:10

Mikhail Tsaplin


1 Answers

It looks like that it is impossible to specify string to date conversion in schema definition. However in DataFrameReader.csv method documentation there is information how to specify date format string via options for every DateType field.

Here is the fixed code:

val df:DataFrame = spark.read.options(Map(
  "header" -> "true",
  "dateFormat" -> "yyyy-MM-dd HH:mm:ssX"
)).schema(StructType(
    StructField("EventTime", DataTypes.DateType, false) ::
    Nil
)).csv("C:/qos1h.csv")
like image 76
Mikhail Tsaplin Avatar answered Oct 30 '22 15:10

Mikhail Tsaplin