Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cant find uuid in org.apache.spark.sql.types.DataTypes

Tags:

uuid

We have a PostgreSQL table which has UUID as one of the column. How do we send UUID field in Spark dataset(using Java) to PostgreSQL DB. We are not able to find uuid field in org.apache.spark.sql.types.DataTypes.

Please advice.

like image 242
Venu Avatar asked Nov 18 '17 17:11

Venu


1 Answers

As already pointed out, despite these resolved issues (10186, 5753) there is still no supported uuid Postgres data type as of Spark 2.3.0.

However, there's a workaround by using Spark's SaveMode.Append and setting the Postgres JDBC property to allow string types to be inferred. In short, it works like:

    val props = Map(
          JDBCOptions.JDBC_DRIVER_CLASS -> "org.postgresql.Driver",
          "url" -> url,
          "user" -> user,
          "stringtype" -> "unspecified"
        )
          
    yourData.write.mode(SaveMode.Append)
        .format("jdbc")
        .options(props)
        .option("dbtable", tableName)
        .save()

The table should be created with the uuid column already defined with type uuid. If you try to have Spark 2.3.0 create this table though, you will again hit a wall:

    yourData.write.mode(SaveMode.Overwrite)
        .format("jdbc")
        .options(props)
        .option("dbtable", tableName)
        .option("createTableColumnTypes", "some_uuid_column_name uuid")
        .save()

Result:

DataType uuid is not supported.(line 1, pos 21)

like image 64
ecoe Avatar answered Nov 15 '22 10:11

ecoe