Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How sparksql can batch insert data to mysql?

All. Recently I need to insert about 100,000,000 data into mysql by using sparksql, however the speed is quite low, it takes about 1 hour. Can some one have a efficent way to insert ?

mysql configuration =>

prop.setProperty("user", "user")
prop.setProperty("password", "password")
prop.setProperty("rewriteBatchedStatements", "true")
prop.setProperty("batchsize", "1000000")
prop.setProperty("numPartitions", "3000") 
like image 570
whathaha Avatar asked Jan 27 '23 16:01

whathaha


1 Answers

Try adding ?rewriteBatchedStatements=true to your MySQL URI. Your URI would be something like it: jdbc:mysql://host:port/db?rewriteBatchedStatements=true

Hope it helps.

Update

Try it, it worked for me. Setting the driver property did the trick.

val prop = new Properties()

prop.setProperty("user", dbUser)
prop.setProperty("password", dbPassword)
prop.put("driver", "com.mysql.jdbc.Driver");

df.write.mode("append").jdbc("jdbc:mysql://" + dbHost + "/" + dbName + "?rewriteBatchedStatements=true", "TABLE_NAME", prop)

And also I had to import the Java MySQL connector.

    <dependency>
        <groupId>mysql</groupId>
        <artifactId>mysql-connector-java</artifactId>
        <version>5.1.39</version>
    </dependency>
like image 120
Álvaro Valencia Avatar answered Feb 11 '23 07:02

Álvaro Valencia