Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to write null value from Spark sql expression of DataFrame to a database table? (IllegalArgumentException: Can't get JDBC type for null)

I receive the error java.lang.IllegalArgumentException: Can't get JDBC type for null when try to run the following example:

    ...
    val spark = SparkSession.builder
    .master("local[*]")
    .appName("Demo")
    .detOrCreate()

    import spark.implicits._

    //load first table
    val df_one = spark.read
    .format("jdbc")
    .option("url",myDbUrl)
    .option("dbtable",myTableOne)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_one.createGlobalTempView("table_one")

    //load second table
    val df_two = spark.read
    .format("jdbc")
    .option("url",myUrl)
    .option("dbtable",myTableTwo)
    .option("user",myUser)
    .option("password",myPassw)
    .load()

    df_two.createGlobalTempView("table_two")

    //perform join of two tables
    val df_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )

//Error there:
    df_result.write
    .format(jdbc)
    .option("dbtable",myResultTable)
    .option("url",myDbUrl)
    .option("user",myUser)
    .option("password",myPassw)
    .mode(SaveMode.Append)
    .save
    ...

I receive the error:

Exception in thread "main" java.lang.IllegalArgumentException: Can't get JDBC type for null
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType$2.apply(JdbcUtils.scala:148)
                at scala.Option.getOrElse(Option.scala:121)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$getJdbcType(JdbcUtils.scala:147)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:663)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$18.apply(JdbcUtils.scala:662)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
                at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
                at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
                at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
                at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:662)
                at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:77)
                at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
                at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)

Workaround, which dramastically slows down the workflow:

...
    // create case class for DataSet
    case class ResultCaseClass(field_one: Option[Int], field_two: Option[Int], field_three: Option[Int])

    //perform join of two tables
    val ds_result = spark.sql(
    "select o.field_one, t.field_two, null as field_three "+
    " from global_temp.table_one o, global_temp.table_two t where o.key_one = t.key_two"
    )
.withColumn("field_one",$"field_one".cast(IntegerType))
.withColumn("field_two",$"field_two".cast(IntegerType))
.withColumn("field_three",$"field_three".cast(IntegerType))
.as[ResultCaseClass]

//Success:
    ds_result.write......
...
like image 703
Vladislav Dotsenko Avatar asked Mar 21 '17 11:03

Vladislav Dotsenko


1 Answers

I encountered the same question as yours.Then I found the error information from java source code. If you insert a null value into a database without specifying the datatype,you will get "Can't get JDBC type for null".The way to fix this problem is casting null to the datatype which is equal to database's filed type.

example:

lit(null).cast(StringType) or lit(null).cast("string")
like image 78
lmnzh Avatar answered Nov 14 '22 17:11

lmnzh