Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Low JDBC write speed from Spark to MySQL

I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?

Code below:

df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')
like image 995
Takashi Lee Avatar asked Apr 28 '16 10:04

Takashi Lee


People also ask

Why is Spark SQL slow?

Sometimes, Spark runs slowly because there are too many concurrent tasks running. The capacity for high concurrency is a beneficial feature, as it provides Spark-native fine-grained sharing. This leads to maximum resource utilization while cutting down query latencies.

Is Spark SQL faster than SQL?

Extrapolating the average I/O rate across the duration of the tests (Big SQL is 3.2x faster than Spark SQL), then Spark SQL actually reads almost 12x more data than Big SQL, and writes 30x more data.

Can Spark connect to MySQL?

Start a Spark Shell and Connect to MySQL Data With the shell running, you can connect to MySQL with a JDBC URL and use the SQL Context load() function to read a table. The Server and Port properties must be set to a MySQL server.


1 Answers

The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true to the connection URL. (See Configuration Properties for Connector/J.)

My benchmark went from 3325 seconds to 42 seconds!

like image 147
Daniel Darabos Avatar answered Oct 10 '22 04:10

Daniel Darabos