Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with saving spark DataFrame as Hive table

I have two spark's data frames. One of them recieved from hive table using HiveContext:

spark_df1 = hc.sql("select * from testdb.titanic_pure_data_test")    

Second spark's dataframe I got from .csv file:

lines = sc.textFile("hdfs://HDFS-1/home/testdb/1500000_Sales_Records.csv").map(lambda line: line.split(","))    

spark_df_test = lines.toDF(['Region','Country','Item_Type','Sales_Channel','Order_Priority','Order_Date','Order_ID','Ship_Date','Units_Sold','Unit_Price','Unit_Cost','Total_Revenue','Total_Cost','Total_Profit'])`

I want to save any dataframe as hive table

spark_df1.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res5")

The first dataframe saved without problems, but when I try to save second dataframe (spark_df_test) in the same way, I got this error

File "/home/jup-user/testdb/scripts/caching.py", line 90, in spark_df_test.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res5") File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 435, in saveAsTable File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco pyspark.sql.utils.AnalysisException: 'Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;'

like image 896
Vladimir Sazonov Avatar asked Oct 26 '18 14:10

Vladimir Sazonov


1 Answers

The problem is you are trying to overwrite the same hive table with the different dataframe. This can't be done right now in spark.

The reason is the following code. This ensures if the table exists to throw an exception. The ideal way is to save the dataframe in a new table

spark_df_test.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res6")

Or you can use 'insertInto'

spark_df_test.write.mode("overwrite").saveAsTable("temp_table")

Then you can overwrite rows in your target table

val tempTable = sqlContext.table("temp_table") 
tempTable
       .write
       .mode("overwrite").insertInto("testdb.new_res5")
like image 182
Avishek Bhattacharya Avatar answered Oct 16 '22 18:10

Avishek Bhattacharya