Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to set jdbc/partitionColumn type to Date in spark 2.4.1

I am trying to retrieve data from oracle using spark-sql-2.4.1 version. I tried to set the JdbcOptions as below :

    .option("lowerBound", "31-MAR-02");
    .option("upperBound", "01-MAY-19");
    .option("partitionColumn", "data_date");
    .option("numPartitions", 240);

But gives error :

    java.lang.IllegalArgumentException: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]
        at java.sql.Timestamp.valueOf(Timestamp.java:204)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:179)

Then tried as below

    .option("lowerBound", "2002-03-31"); //changed the date format
    .option("upperBound", "2019-05-02");
    .option("partitionColumn", "data_date"); 
    .option("numPartitions", 240);

Still no luck. So what is the correct way to pass the date as "lower/upperBound"? Is there a way to specify/set option parameter data Type ?

Part-2 Checked the options properly. they were overwritten in between before executing the query. So corrected it. ... now that error resolved.

But for below options:

.option("lowerBound", "2002-03-31 00:00:00"); 
.option("upperBound", "2019-05-01 23:59:59");
.option("timestampFormat", "yyyy-mm-dd hh:mm:ss");

Query String :

query ->  ( SELECT * FROM MODEL_VALS ) T

It is throwing another error :

java.sql.SQLException: ORA-12801: error signaled in parallel query server P022, instance nj0005

ORA-01861: literal does not match format string
like image 368
BdEngineer Avatar asked May 03 '19 08:05

BdEngineer


People also ask

How do I get date and time in spark?

Apache Spark / Spark SQL Functions Spark SQL provides built-in standard Date and Timestamp (includes date and time) Functions defines in DataFrame API, these come in handy when we need to make operations on date and time. All these accept input as, Date type, Timestamp type or String.

Which date formats are supported by Spark date functions?

Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column.

What are the partitioning parameters in spark?

As per Spark docs, these partitioning parameters describe how to partition the table when reading in parallel from multiple workers: These are optional parameters. Only 1 worker read the whole data?

How to convert string to date format in Spark SQL?

In this tutorial, we will show you a Spark SQL example of how to convert String to Date format using to_date () function on the DataFrame column with Scala example. Note that Spark Date Functions support all Java Date formats specified in DateTimeFormatter. to_date () – function is used to format string ( StringType) to date ( DateType) column.


2 Answers

If you are using Oracle, see https://github.com/apache/spark/blob/master/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala#L441

val df1 = spark.read.format("jdbc")
      .option("url", jdbcUrl)
      .option("dbtable", "datetimePartitionTest")
      .option("partitionColumn", "d")
      .option("lowerBound", "2018-07-06")
      .option("upperBound", "2018-07-20")
      .option("numPartitions", 3)
      // oracle.jdbc.mapDateToTimestamp defaults to true. If this flag is not disabled, column d
      // (Oracle DATE) will be resolved as Catalyst Timestamp, which will fail bound evaluation of
      // the partition column. E.g. 2018-07-06 cannot be evaluated as Timestamp, and the error
      // message says: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff].
      .option("oracle.jdbc.mapDateToTimestamp", "false")
      .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'")
      .load()
like image 86
alexm Avatar answered Sep 18 '22 14:09

alexm


I stumbled on this question as I am solving a similar problem. But in this case Spark 2.4.2 is sending date in format 'yyyy-MM-dd HH:mm:ss.ssss' to Oracle and it returned "Not a valid month" as it expects 'dd-MMM-yy HH:mm:ss.ssss'. To solve that I followed: Spark GitHub Link , it says:

Override beforeFetch method in OracleDialect to finish the following two things:

Set Oracle's NLS_TIMESTAMP_FORMAT to "YYYY-MM-DD HH24:MI:SS.FF" to match java.sql.Timestamp format. Set Oracle's NLS_DATE_FORMAT to "YYYY-MM-DD" to match java.sql.Date format.

And it solved the issue. Hope it helps.

like image 28
Ankush Avatar answered Sep 17 '22 14:09

Ankush