While reading the Datastax docs for supported syntax of Spark SQL, I noticed you can use INSERT
statements like you would normally do:
INSERT INTO hello (someId,name) VALUES (1,"hello")
Testing this out in a Spark 2.0 (Python) environment and a connection to a Mysql database, throws the error:
File "/home/yawn/spark-2.0.0-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
pyspark.sql.utils.ParseException:
u'\nmismatched input \'someId\' expecting {\'(\', \'SELECT\', \'FROM\', \'VALUES\', \'TABLE\', \'INSERT\', \'MAP\', \'REDUCE\'}(line 1, pos 19)\n\n== SQL ==\nINSERT INTO hello (someId,name) VALUES (1,"hello")\n-------------------^^^\n'
However if I remove the explicit column definition, it works as expected:
INSERT INTO hello VALUES (1,"hello")
Am I missing something?
Spark SQL is Apache Spark's module for working with structured data. The SQL Syntax section describes the SQL syntax in detail along with usage examples when applicable. This document provides a list of Data Definition and Data Manipulation Statements, as well as Data Retrieval and Auxiliary Statements.
Conclusion. In summary the difference between Hive INSERT INTO vs INSERT OVERWRITE, INSERT INTO is used to append the data into Hive tables and partitioned tables and INSERT OVERWRITE is used to remove the existing data from the table and insert the new data.
The INSERT OVERWRITE statement overwrites the existing data in the table using the new values. The inserted rows can be specified by value expressions or result from a query.
Spark support hive syntax so if you want to insert row you can do as follows
insert into hello select t.* from (select 1, 'hello') t;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With