How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Question

I receive the error java.lang.IllegalArgumentException: Can't get JDBC type for null when try to run the following example:

    ...
    val spark = SparkSession.builder
    .master("local[*]")
    .appName("Demo")
    .detOrCreate()

    import spark.implicits._

    //load first table
    val df_one = spark.read
    .format("jdbc")
    .option("url",myDbUrl)
    .option("dbtable",myTableOne)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_one.createGlobalTempView("table_one")

    //load second table
    val df_two = spark.read
    .format("jdbc")
    .option("url",myUrl)
    .option("dbtable",myTableTwo)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_two.createGlobalTempView("table_two")

    //perform join of two tables
    val df_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )

//Error there:
    df_result.write
    .format(jdbc)
    .option("dbtable",myResultTable)
    .option("url",myDbUrl)
    .option("user",myUser)
    .option("password",myPassw)
    .mode(SaveMode.Append)
    .save
    ...

I receive the error:

Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for null
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at scala.Option.getOrElse(Option.scala:121)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:147)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:663)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:662)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:662)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
                at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
                at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)

Workaround, which dramastically slows down the workflow:

...
    // create case class for DataSet
    case class ResultCaseClass(field_one: Option[Int], field_two: Option[Int], field_three: Option[Int])

    //perform join of two tables
    val ds_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )
.withColumn("field_one",$"field_one".cast(IntegerType))
.withColumn("field_two",$"field_two".cast(IntegerType))
.withColumn("field_three",$"field_three".cast(IntegerType))
.as[ResultCaseClass]

//Success:
    ds_result.write......
...

lmnzh · Accepted Answer

I encountered the same question as yours.Then I found the error information from java source code. If you insert a null value into a database without specifying the datatype,you will get "Can't get JDBC type for null".The way to fix this problem is casting null to the datatype which is equal to database's filed type.

example:

lit(null).cast(StringType) or lit(null).cast("string")

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Tags:

apache-spark

apache-spark-sql

Vladislav Dotsenko

1 Answers

example:

lmnzh

Recent Activity

Donate For Us

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

Tags:

apache-spark

apache-spark-sql

Vladislav Dotsenko

1 Answers

example:

lmnzh

Related questions

Recent Activity

Donate For Us