Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PySpark, order of column on write to MySQL with JDBC

I'm struggling a bit understanding spark and writing dataframes to a mysql database. I have the following code:

forecastDict = {'uuid': u'8df34d5a-ce02-4d02-b282-e10363690122', 'created_at': datetime.datetime(2014, 12, 31, 23, 0)}
forecastFrame =  sqlContext.createDataFrame([forecastDict])    
forecastFrame.write.jdbc(url="jdbc:mysql://example.com/example_db?user=bla&password=blabal123", table="example_table", mode="append")

The last line in the code throws the following error:

Incorrect datetime value: '8df34d5a-ce02-4d02-b282-e10363690122' for column 'created_at' at row 1

I can post the entire stack trace if necessary, but basically what's happening here is that the pyspark is mapping the uuid field to the wrong column in mysql. Here's the mysql definition:

mysql> show create table example_table;
...
CREATE TABLE `example_table` (
`uuid` varchar(36) NOT NULL,
`created_at` datetime NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
...

If we change the mysql definition to the following (notice that only the order of the columns is different):

CREATE TABLE `example_table` (
`created_at` datetime NOT NULL,
`uuid` varchar(36) NOT NULL,
PRIMARY KEY (`uuid`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The insert works fine. Is there a way to implement this without being dependent on the order of the columns, or what's the preferred way of saving data to an external relational database from spark?

Thanks!

--chris

like image 832
flooose Avatar asked Dec 02 '25 20:12

flooose


1 Answers

I would simply force expected order on write:

url = ...
table = ...

columns = (sqlContext.read.format('jdbc')
  .options(url=url, dbtable=table)
  .load()
  .columns())

forecastFrame.select(*columns).write.jdbc(url=url, dbtable=table, mode='append')

Also be careful with using schema inference on dictionaries. This is not only deprecated but also rather unstable.

like image 171
zero323 Avatar answered Dec 04 '25 17:12

zero323