Problem with saving spark DataFrame as Hive table

Question

I have two spark's data frames. One of them recieved from hive table using HiveContext:

spark_df1 = hc.sql("select * from testdb.titanic_pure_data_test")

Second spark's dataframe I got from .csv file:

lines = sc.textFile("hdfs://HDFS-1/home/testdb/1500000_Sales_Records.csv").map(lambda line: line.split(","))    

spark_df_test = lines.toDF(['Region','Country','Item_Type','Sales_Channel','Order_Priority','Order_Date','Order_ID','Ship_Date','Units_Sold','Unit_Price','Unit_Cost','Total_Revenue','Total_Cost','Total_Profit'])`

I want to save any dataframe as hive table

spark_df1.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res5")

The first dataframe saved without problems, but when I try to save second dataframe (spark_df_test) in the same way, I got this error

File "/home/jup-user/testdb/scripts/caching.py", line 90, in spark_df_test.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res5") File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 435, in saveAsTable File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call File "/data_disk/opt/cloudera/parcels/CDH-5.15.1-1.cdh5.15.1.p0.4/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 51, in deco pyspark.sql.utils.AnalysisException: 'Specifying database name or other qualifiers are not allowed for temporary tables. If the table name has dots (.) in it, please quote the table name with backticks (`).;'

Avishek Bhattacharya · Accepted Answer

The problem is you are trying to overwrite the same hive table with the different dataframe. This can't be done right now in spark.

The reason is the following code. This ensures if the table exists to throw an exception. The ideal way is to save the dataframe in a new table

spark_df_test.write.mode("overwrite").format("orc").saveAsTable("testdb.new_res6")

Or you can use 'insertInto'

spark_df_test.write.mode("overwrite").saveAsTable("temp_table")

Then you can overwrite rows in your target table

val tempTable = sqlContext.table("temp_table") 
tempTable
       .write
       .mode("overwrite").insertInto("testdb.new_res5")

Problem with saving spark DataFrame as Hive table

Tags:

python

apache-spark

pyspark

hive

Vladimir Sazonov

1 Answers

Avishek Bhattacharya

Recent Activity

Donate For Us

Problem with saving spark DataFrame as Hive table

Tags:

python

apache-spark

pyspark

hive

Vladimir Sazonov

1 Answers

Avishek Bhattacharya

Related questions

Recent Activity

Donate For Us