Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"INSERT INTO ..." with SparkSQL HiveContext

Tags:

I'm trying to run an insert statement with my HiveContext, like this:

hiveContext.sql('insert into my_table (id, score) values (1, 10)') 

The 1.5.2 Spark SQL Documentation doesn't explicitly state whether this is supported or not, although it does support "dynamic partition insertion".

This leads to a stack trace like

AnalysisException:  Unsupported language features in query: insert into my_table (id, score) values (1, 10) TOK_QUERY 0, 0,20, 0   TOK_FROM 0, -1,20, 0     TOK_VIRTUAL_TABLE 0, -1,20, 0       TOK_VIRTUAL_TABREF 0, -1,-1, 0         TOK_ANONYMOUS 0, -1,-1, 0       TOK_VALUES_TABLE 1, 13,20, 41         TOK_VALUE_ROW 1, 15,20, 41           1 1, 16,16, 41           10 1, 19,19, 44   TOK_INSERT 1, 0,-1, 12     TOK_INSERT_INTO 1, 0,11, 12       TOK_TAB 1, 4,4, 12         TOK_TABNAME 1, 4,4, 12           my_table 1, 4,4, 12       TOK_TABCOLNAME 1, 7,10, 22         id 1, 7,7, 22         score 1, 10,10, 26     TOK_SELECT 0, -1,-1, 0       TOK_SELEXPR 0, -1,-1, 0         TOK_ALLCOLREF 0, -1,-1, 0  scala.NotImplementedError: No parse rules for:  TOK_VIRTUAL_TABLE 0, -1,20, 0   TOK_VIRTUAL_TABREF 0, -1,-1, 0     TOK_ANONYMOUS 0, -1,-1, 0   TOK_VALUES_TABLE 1, 13,20, 41     TOK_VALUE_ROW 1, 15,20, 41       1 1, 16,16, 41       10 1, 19,19, 44 

Is there any other way to insert to a Hive table that is supported?

like image 792
Kirk Broadhurst Avatar asked Nov 25 '15 17:11

Kirk Broadhurst


People also ask

What is Pyspark HiveContext?

class pyspark.sql.HiveContext(sparkContext, hiveContext=None) A variant of Spark SQL that integrates with data stored in Hive. Configuration for Hive is read from hive-site. xml on the classpath. It supports running both SQL and HiveQL commands.

What is the use of SQLContext?

SQLContext is the entry point to SparkSQL which is a Spark module for structured data processing. Once SQLContext is initialised, the user can then use it in order to perform various “sql-like” operations over Datasets and Dataframes.

What is SparkSQL?

Spark SQL is a Spark module for structured data processing. It provides a programming abstraction called DataFrames and can also act as a distributed SQL query engine. It enables unmodified Hadoop Hive queries to run up to 100x faster on existing deployments and data.


1 Answers

Data can be appended to a Hive table using the append mode on the DataFrameWriter.

data = hc.sql("select 1 as id, 10 as score") data.write.mode("append").saveAsTable("my_table") 

This gives the same result as an insert.

like image 194
Kirk Broadhurst Avatar answered Nov 29 '22 05:11

Kirk Broadhurst