Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark createTableColumnTypes Not Resulting in user supplied schema

Not sure why this isn't working but I'm just trying to apply the below but still getting spark's version of the schema for the table (mysql) containing text instead of varchar(128) I'm trying to specify. Trying to just create custom datatypes for my columns with jdbc write. Trying with spark 2.1.0:

  def df2DB(
    df: DataFrame,
    batchSize: Int,
    numPartitions: Int,
    database: String,
    table: String): Unit = {

    val mdb = new MetadataBuilder()
    mdb.putString("col1", "INT")
    mdb.putString("col2", "VARCHAR(128)")
    mdb.putString("col3", "VARCHAR(128)")
    val createTableColTypes = mdb.build().json

    df.write.format("jdbc")
      .option("createTableColumnTypes", createTableColTypes)
      .option("url", url)
      .option("dbtable", s"${database}.${table}")
      .option("user", user)
      .option("password", pass)
      .option("driver", driver)
      .option("batchsize", batchSize)
      .option("numPartitions", numPartitions)
      .save()
  }

I also tried this format without success:

df.write.format("jdbc")
  .mode(SaveMode.Overwrite)
  .option("url", url)
  .option("dbtable", s"${database}.${table}")
  .option("user", user)
  .option("password", pass)
  .option("driver", driver)
  .option("batchsize", batchSize)
  .option("numPartitions", numPartitions)
  .option("createTableColumnTypes", "COL1 INT, COL2 VARCHAR(128)" )
  .save()

Also even when I try using createTableOptions like this I'm getting a sql syntax error. I'm not finding any good examples of using these options either together or separately:

.option("createTableOptions", "CREATE TABLE tbl1 (col1 int, col2 VARCHAR(128))").save()
like image 572
horatio1701d Avatar asked Dec 02 '17 11:12

horatio1701d


1 Answers

Instead of creating JSON of your column name to datatype, try using a comma separated list of column name with data type something like this:

def df2DB(
    df: DataFrame,
    batchSize: Int,
    numPartitions: Int,
    database: String,
    table: String): Unit = {

    df.write.format("jdbc")
      .option("createTableColumnTypes", "col1 INT, col2 VARCHAR(128), col3 VARCHAR(128)")
      .option("url", url)
      .option("dbtable", s"${database}.${table}")
      .option("user", user)
      .option("password", pass)
      .option("driver", driver)
      .option("batchsize", batchSize)
      .option("numPartitions", numPartitions)
      .save()
  }

Reference: https://github.com/apache/spark/blob/aa4cf2b19e4cf5588af7e2192e0e9f687cd84bc5/examples/src/main/python/sql/datasource.py#L210

like image 57
vatsal mevada Avatar answered Oct 21 '22 15:10

vatsal mevada