Using Spark 1.4.0, I am trying to insert data from a Spark DataFrame into a MemSQL database (which should be exactly like interacting with a MySQL database) using insertIntoJdbc(). However I keep getting a Runtime TableAlreadyExists exception. First I create the MemSQL table like this: <pre class="prettyprint"><code>CREATE TABLE IF NOT EXISTS table1 (id INT AUTO_INCREMENT PRIMARY KEY, val INT); </code></pre> Then I create a simple dataframe in Spark and try to insert into MemSQL like this: <pre class="prettyprint"><code>val df = sc.parallelize(Array(123,234)).toDF.toDF("val") //df: org.apache.spark.sql.DataFrame = [val: int] df.insertIntoJDBC("jdbc:mysql://172.17.01:3306/test?user=root", "table1", false) java.lang.RuntimeException: Table table1 already exists. </code></pre>

The insertIntoJDBC docs are actually incorrect; they say that the table must already exist, but in fact if it does, it'll throw an error, as you can see above: https://github.com/apache/spark/blob/03cca5dce2cd7618b5c0e33163efb8502415b06e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L264 We recommend using our MemSQL Spark connector, which you can find here: https://github.com/memsql/memsql-spark-connector If you include that library and import com.memsql.spark.connector._ in your code, you can use df.saveToMemSQL(...) to save your DataFrame to MemSQL. You can find documentation for our connector here: http://memsql.github.io/memsql-spark-connector/latest/api/#com.memsql.spark.connector.DataFrameFunctions

Spark DataFrame InsertIntoJDBC - TableAlreadyExists Exception

Tags:

mysql

apache-spark

spark-dataframe

singlestore

Using Spark 1.4.0, I am trying to insert data from a Spark DataFrame into a MemSQL database (which should be exactly like interacting with a MySQL database) using insertIntoJdbc(). However I keep getting a Runtime TableAlreadyExists exception.

First I create the MemSQL table like this:

CREATE TABLE IF NOT EXISTS table1 (id INT AUTO_INCREMENT PRIMARY KEY, val INT);

Then I create a simple dataframe in Spark and try to insert into MemSQL like this:

val df = sc.parallelize(Array(123,234)).toDF.toDF("val")
//df: org.apache.spark.sql.DataFrame = [val: int]

df.insertIntoJDBC("jdbc:mysql://172.17.01:3306/test?user=root", "table1", false)

java.lang.RuntimeException: Table table1 already exists.

953

asked Oct 02 '15 20:10

DJElbow

3 Answers

This solution applies to general JDBC connections, although the answer by @wayne is probably a better solution for memSQL specifically.

insertIntoJdbc seems to have been deprecated as of 1.4.0, and using it actually calls write.jdbc().

write() returns a DataFrameWriter object. If you want to append data to your table you will have to change the save mode of the object to "append".

Another issue with the example in the question above is the DataFrame schema didn't match the schema of the target table.

The code below gives a working example from the Spark shell. I am using spark-shell --driver-class-path mysql-connector-java-5.1.36-bin.jar to start my spark-shell session.

import java.util.Properties

val prop = new Properties() 
prop.put("user", "root")
prop.put("password", "")  

val df = sc.parallelize(Array((1,234), (2,1233))).toDF.toDF("id", "val")   
val dfWriter = df.write.mode("append") 

dfWriter.jdbc("jdbc:mysql://172.17.01:3306/test", "table1", prop)

answered Oct 17 '22 08:10

DJElbow

The insertIntoJDBC docs are actually incorrect; they say that the table must already exist, but in fact if it does, it'll throw an error, as you can see above:

https://github.com/apache/spark/blob/03cca5dce2cd7618b5c0e33163efb8502415b06e/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L264

We recommend using our MemSQL Spark connector, which you can find here:

https://github.com/memsql/memsql-spark-connector

If you include that library and import com.memsql.spark.connector._ in your code, you can use df.saveToMemSQL(...) to save your DataFrame to MemSQL. You can find documentation for our connector here:

http://memsql.github.io/memsql-spark-connector/latest/api/#com.memsql.spark.connector.DataFrameFunctions

answered Oct 17 '22 08:10

Wayne Song

I had same issue . Updating spark version to 1.6.2 worked fine

answered Oct 17 '22 08:10

Dinesh Parmar

Related questions
                            
                                Set maximum execution time in MYSQL / PHP
                            
                                Storing data into session and storing to database upon "major" action
                            
                                How to Call Java Code from MySQL?
                            
                                MySQL Many-To-Many Query Problem
                            
                                How long should it take to build an index using ALTER TABLE in MySQL?
                            
                                Cassandra instead of MySQL for social networking app
                            
                                best way to store 1:1 user relationships in relational database
                            
                                INT vs VARCHAR in search
                            
                                Is there access control for CouchDB databases like PostgreSQL and MySQL privilege management?
                            
                                Stored procedures IN, OUT, INOUT parameters
                            
                                PHP Caching - Is it faster to save in database or create a file?
                            
                                What's MySQL's SQL specific programming language name?
                            
                                dbWriteTable(..., append = T) is overwritng in R
                            
                                query_cache_type: enable or disable?
                            
                                mysql query points within polygon - no results
                            
                                xampp change mysql data dir not working
                            
                                Cross join between multiple schemas in MySQL. Privileges and performance
                            
                                An efficient database design for a simple forum using php and mysql
                            
                                How to preserve UTF8mb4 data with mysqldump?
                            
                                Create MySQL Tables with Ansible

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Spark DataFrame InsertIntoJDBC - TableAlreadyExists Exception

Tags:

mysql

apache-spark

spark-dataframe

singlestore

DJElbow

People also ask

3 Answers

DJElbow

Wayne Song

Dinesh Parmar

Recent Activity

Donate For Us