I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it? Code below: <pre class="prettyprint"><code>df = sqlContext.createDataFrame(rdd, schema) df.write.jdbc(url='xx', table='xx', mode='overwrite') </code></pre>

The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add <code>rewriteBatchedStatements=true</code> to the connection URL. (See Configuration Properties for Connector/J.) My benchmark went from 3325 seconds to 42 seconds!

Low JDBC write speed from Spark to MySQL

Tags:

apache-spark

pyspark

I need write about 1 million rows from Spark a DataFrame to MySQL but the insert is too slow. How can I improve it?

Code below:

df = sqlContext.createDataFrame(rdd, schema)
df.write.jdbc(url='xx', table='xx', mode='overwrite')

995

asked Apr 28 '16 10:04

Takashi Lee

1 Answers

The answer in https://stackoverflow.com/a/10617768/3318517 has worked for me. Add rewriteBatchedStatements=true to the connection URL. (See Configuration Properties for Connector/J.)

My benchmark went from 3325 seconds to 42 seconds!

147

answered Oct 10 '22 04:10

Daniel Darabos

Related questions
                            
                                How to continuously monitor a directory by using Spark Structured Streaming
                            
                                How to access an array element in dataframe column (scala) [duplicate]
                            
                                spark windowing function VS group by performance issue
                            
                                Operating RDD failed while setting Spark record delimiter with org.apache.hadoop.conf.Configuration
                            
                                Classpath resolution between spark uber jar and spark-submit --jars when similar classes exist in both
                            
                                spark-submit EMR Step failing when submitted using boto3 client
                            
                                Count instances of combination of columns in spark dataframe using scala
                            
                                Calculate quantile on grouped data in spark Dataframe
                            
                                Whole-Stage Code Generation in Spark 2.0
                            
                                Spark Dataframe select based on column index
                            
                                Spark-scala : Check whether a S3 directory exists or not before reading it
                            
                                How to drop malformed rows while reading csv with schema Spark?
                            
                                Number of unique elements in all columns of a pyspark dataframe [duplicate]
                            
                                Fine grained transformation vs coarse grained transformations
                            
                                Inserting Analytic data from Spark to Postgres
                            
                                PySpark & MLLib: Class Probabilities of Random Forest Predictions
                            
                                spark-streaming and connection pool implementation
                            
                                How can I use proto3 with Hadoop/Spark?
                            
                                Spark Scala : Unable to import sqlContext.implicits._
                            
                                Spark saveAsTextFile() results in Mkdirs failed to create for half of the directory