I am reading a Hive table using Spark SQL and assigning it to a scala val <pre class="prettyprint"><code>val x = sqlContext.sql("select * from some_table") </code></pre> Then I am doing some processing with the dataframe x and finally coming up with a dataframe y , which has the exact schema as the table some_table. Finally I am trying to insert overwrite the y dataframe to the same hive table some_table <pre class="prettyprint"><code>y.write.mode(SaveMode.Overwrite).saveAsTable().insertInto("some_table") </code></pre> Then I am getting the error <blockquote> org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from </blockquote> I tried creating an insert sql statement and firing it using sqlContext.sql() but it too gave me the same error. Is there any way I can bypass this error? I need to insert the records back to the same table. <hr> Hi I tried doing as suggested , but still getting the same error . <pre class="prettyprint"><code>val x = sqlContext.sql("select * from incremental.test2") val y = x.limit(5) y.registerTempTable("temp_table") val dy = sqlContext.table("temp_table") dy.write.mode("overwrite").insertInto("incremental.test2") scala> dy.write.mode("overwrite").insertInto("incremental.test2") org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from.; </code></pre>

Actually you can also use checkpointing to achieve this. Since it breaks data lineage, Spark is not able to detect that you are reading and overwriting in the same table: <pre class="prettyprint"><code> sqlContext.sparkContext.setCheckpointDir(checkpointDir) val ds = sqlContext.sql("select * from some_table").checkpoint() ds.write.mode("overwrite").saveAsTable("some_table") </code></pre>

You should first save your DataFrame <code>y</code> in a temporary table <pre class="prettyprint"><code>y.write.mode("overwrite").saveAsTable("temp_table") </code></pre> Then you can overwrite rows in your target table <pre class="prettyprint"><code>val dy = sqlContext.table("temp_table") dy.write.mode("overwrite").insertInto("some_table") </code></pre>

You should first save your <code>DataFrame y</code> like a parquet file: <pre class="prettyprint"><code>y.write.parquet("temp_table") </code></pre> After you load this like: <pre class="prettyprint"><code>val parquetFile = sqlContext.read.parquet("temp_table") </code></pre> And finish you insert your data in your table <pre class="prettyprint"><code>parquetFile.write.insertInto("some_table") </code></pre>

Read from a hive table and write back to it using spark sql

Tags:

scala

apache-spark

apache-spark-sql

hadoop

spark-dataframe

I am reading a Hive table using Spark SQL and assigning it to a scala val

val x = sqlContext.sql("select * from some_table")

Then I am doing some processing with the dataframe x and finally coming up with a dataframe y , which has the exact schema as the table some_table.

Finally I am trying to insert overwrite the y dataframe to the same hive table some_table

y.write.mode(SaveMode.Overwrite).saveAsTable().insertInto("some_table")

Then I am getting the error

org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from

I tried creating an insert sql statement and firing it using sqlContext.sql() but it too gave me the same error.

Is there any way I can bypass this error? I need to insert the records back to the same table.

Hi I tried doing as suggested , but still getting the same error .

val x = sqlContext.sql("select * from incremental.test2")
val y = x.limit(5)
y.registerTempTable("temp_table")
val dy = sqlContext.table("temp_table")
dy.write.mode("overwrite").insertInto("incremental.test2")

scala> dy.write.mode("overwrite").insertInto("incremental.test2")
             org.apache.spark.sql.AnalysisException: Cannot insert overwrite into table that is also being read from.;

407

asked Aug 03 '16 14:08

Avi

4 Answers

Actually you can also use checkpointing to achieve this. Since it breaks data lineage, Spark is not able to detect that you are reading and overwriting in the same table:

 sqlContext.sparkContext.setCheckpointDir(checkpointDir)
 val ds = sqlContext.sql("select * from some_table").checkpoint()
 ds.write.mode("overwrite").saveAsTable("some_table")

answered Sep 16 '22 12:09

nsanglar

You should first save your DataFrame y in a temporary table

y.write.mode("overwrite").saveAsTable("temp_table")

Then you can overwrite rows in your target table

val dy = sqlContext.table("temp_table")
dy.write.mode("overwrite").insertInto("some_table")

answered Sep 19 '22 12:09

cheseaux

You should first save your DataFrame y like a parquet file:

y.write.parquet("temp_table")

After you load this like:

val parquetFile = sqlContext.read.parquet("temp_table")

And finish you insert your data in your table

parquetFile.write.insertInto("some_table")

answered Sep 16 '22 12:09

matteus silva

In context to Spark 2.2

This error means that our process is reading from same table and writing to same table.
Normally, this should work as process writes to directory .hiveStaging...
This error occurs in case of saveAsTable method, as it overwrites entire table instead of individual partitions.
This error should not occur with insertInto method, as it overwrites partitions not the table.
A reason why this happening is because Hive table has following Spark TBLProperties in its definition. This problem will solve for insertInto method if you remove following Spark TBLProperties -

'spark.sql.partitionProvider' 'spark.sql.sources.provider' 'spark.sql.sources.schema.numPartCols 'spark.sql.sources.schema.numParts' 'spark.sql.sources.schema.part.0' 'spark.sql.sources.schema.part.1' 'spark.sql.sources.schema.part.2' 'spark.sql.sources.schema.partCol.0' 'spark.sql.sources.schema.partCol.1'

https://querydb.blogspot.com/2019/07/read-from-hive-table-and-write-back-to.html

when we upgraded our HDP to 2.6.3 The Spark was updated from 2.2 to 2.3 which resulted in below error -

Caused by: org.apache.spark.sql.AnalysisException: Cannot overwrite a path that is also being read from.;

at org.apache.spark.sql.execution.command.DDLUtils$.verifyNotReadPath(ddl.scala:906)

This error occurs for job where-in we are reading and writing to same path. Like Jobs with SCD Logic

Solution -

Set --conf "spark.sql.hive.convertMetastoreOrc=false"
or, update the job such that it writes data to a temporary table. Then reads from temporary table and insert it into final table.

https://querydb.blogspot.com/2020/09/orgapachesparksqlanalysisexception.html

answered Sep 16 '22 12:09

dinesh028

Related questions
                            
                                O(1) conversion from mutable.Map to immutable.Map?
                            
                                Unit testing several implementation of the same trait/interface
                            
                                Scala: How do I define an anonymous function with a variable argument list?
                            
                                Why does Scala choose the type 'Product' for 'for' expressions involving Either and value definitions
                            
                                Compiler error about class graph being not finitary due to a expansively recursive type parameter
                            
                                Modifying environment variable for a process with scala.sys.process?
                            
                                Zip two HashMaps(or dictionaries)
                            
                                How do I write a JSON Format for an object in the Java library that doesn't have an apply method?
                            
                                Can only do 4 concurrent futures as maximum in Scala
                            
                                How to add source file name to each row in Spark?
                            
                                How to convert from from java.util.Map to a Scala Map
                            
                                Functional equivalent of if (p(f(a), f(b)) a else b
                            
                                When is a return type required for methods in Scala?
                            
                                Force initialization of Scala singleton object
                            
                                Debug long compile times in Scala and SBT
                            
                                How to merge a JsValue to JsObject in flat level
                            
                                What is "at" in shapeless (scala)?
                            
                                How to inject multi dependencies when I use "Reader monad" for dependency injection?
                            
                                canEqual() in the scala.Equals trait
                            
                                Applying function to Spark Dataframe Column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read from a hive table and write back to it using spark sql

Tags:

scala

apache-spark

apache-spark-sql

hadoop

spark-dataframe

Avi

People also ask

4 Answers

nsanglar

cheseaux

matteus silva

dinesh028

Recent Activity

Donate For Us