Spark createTableColumnTypes Not Resulting in user supplied schema

Question

Not sure why this isn't working but I'm just trying to apply the below but still getting spark's version of the schema for the table (mysql) containing text instead of varchar(128) I'm trying to specify. Trying to just create custom datatypes for my columns with jdbc write. Trying with spark 2.1.0:

  def df2DB(
    df: DataFrame,
    batchSize: Int,
    numPartitions: Int,
    database: String,
    table: String): Unit = {

    val mdb = new MetadataBuilder()
    mdb.putString("col1", "INT")
    mdb.putString("col2", "VARCHAR(128)")
    mdb.putString("col3", "VARCHAR(128)")
    val createTableColTypes = mdb.build().json

    df.write.format("jdbc")
      .option("createTableColumnTypes", createTableColTypes)
      .option("url", url)
      .option("dbtable", s"${database}.${table}")
      .option("user", user)
      .option("password", pass)
      .option("driver", driver)
      .option("batchsize", batchSize)
      .option("numPartitions", numPartitions)
      .save()
  }

I also tried this format without success:

df.write.format("jdbc")
  .mode(SaveMode.Overwrite)
  .option("url", url)
  .option("dbtable", s"${database}.${table}")
  .option("user", user)
  .option("password", pass)
  .option("driver", driver)
  .option("batchsize", batchSize)
  .option("numPartitions", numPartitions)
  .option("createTableColumnTypes", "COL1 INT, COL2 VARCHAR(128)" )
  .save()

Also even when I try using createTableOptions like this I'm getting a sql syntax error. I'm not finding any good examples of using these options either together or separately:

.option("createTableOptions", "CREATE TABLE tbl1 (col1 int, col2 VARCHAR(128))").save()

vatsal mevada · Accepted Answer

Instead of creating JSON of your column name to datatype, try using a comma separated list of column name with data type something like this:

def df2DB(
    df: DataFrame,
    batchSize: Int,
    numPartitions: Int,
    database: String,
    table: String): Unit = {

    df.write.format("jdbc")
      .option("createTableColumnTypes", "col1 INT, col2 VARCHAR(128), col3 VARCHAR(128)")
      .option("url", url)
      .option("dbtable", s"${database}.${table}")
      .option("user", user)
      .option("password", pass)
      .option("driver", driver)
      .option("batchsize", batchSize)
      .option("numPartitions", numPartitions)
      .save()
  }

Reference: https://github.com/apache/spark/blob/aa4cf2b19e4cf5588af7e2192e0e9f687cd84bc5/examples/src/main/python/sql/datasource.py#L210

Spark createTableColumnTypes Not Resulting in user supplied schema

Tags:

jdbc

apache-spark

horatio1701d

1 Answers

vatsal mevada

Recent Activity

Donate For Us

Spark createTableColumnTypes Not Resulting in user supplied schema

Tags:

jdbc

apache-spark

horatio1701d

1 Answers

vatsal mevada

Related questions

Recent Activity

Donate For Us