PySpark, order of column on write to MySQL with JDBC

Question

I'm struggling a bit understanding spark and writing dataframes to a mysql database. I have the following code:

forecastDict = {'uuid': u'8df34d5a-ce02-4d02-b282-e10363690122', 'created_at': datetime.datetime(2014, 12, 31, 23, 0)}
forecastFrame =  sqlContext.createDataFrame([forecastDict])    
forecastFrame.write.jdbc(url="jdbc:mysql://example.com/example_db?user=bla&password=blabal123", table="example_table", mode="append")

The last line in the code throws the following error:

Incorrect datetime value: '8df34d5a-ce02-4d02-b282-e10363690122' for column 'created_at' at row 1

I can post the entire stack trace if necessary, but basically what's happening here is that the pyspark is mapping the uuid field to the wrong column in mysql. Here's the mysql definition:

mysql> show create table example_table;
...
CREATE TABLE `example_table` (
`uuid` varchar(36) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
...

If we change the mysql definition to the following (notice that only the order of the columns is different):

CREATE TABLE `example_table` (
`created_at` datetime NOT NULL,
`uuid` varchar(36) NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The insert works fine. Is there a way to implement this without being dependent on the order of the columns, or what's the preferred way of saving data to an external relational database from spark?

Thanks!

--chris

zero323 · Accepted Answer

I would simply force expected order on write:

url = ...
table = ...

columns = (sqlContext.read.format('jdbc')
  .options(url=url, dbtable=table)
  .load()
  .columns())

forecastFrame.select(*columns).write.jdbc(url=url, dbtable=table, mode='append')

Also be careful with using schema inference on dictionaries. This is not only deprecated but also rather unstable.

PySpark, order of column on write to MySQL with JDBC

Tags:

mysql

jdbc

apache-spark

apache-spark-sql

pyspark

flooose

1 Answers

zero323

Recent Activity

Donate For Us

PySpark, order of column on write to MySQL with JDBC

Tags:

mysql

jdbc

apache-spark

apache-spark-sql

pyspark

flooose

1 Answers

zero323

Related questions

Recent Activity

Donate For Us