The below command was successfully converting external tables to managed tables in Spark 2.0.0
:
ALTER TABLE {table_name} SET TBLPROPERTIES(EXTERNAL=FLASE);
However the above command is failing in Spark 2.2.0
with the below error:
Error in query: Cannot set or change the preserved property key: 'EXTERNAL';
The External Tables feature lets you read from and write to data sources stored outside of Vector. The data source must be one that Apache Spark is able to read from and write to, such as HDFS files stored in formats like Parquet, ORC, JSON, or tables in external database systems.
Spark allows you to create two types of tables: managed and unmanaged. For a managed table, Spark manages both the metadata and the data in the file store.
Use case: There is lot of data in the locally managed table and we want to convert those table into external table because we are working on a use case where our spark and home grown application has trouble reading locally managed tables. Solution: alter table table_name SET TBLPROPERTIES ('EXTERNAL'='TRUE');
Unmanaged tables provide much more flexibility, as the table data can be stored in a location of your choice, or the table can be built directly on top of data available in an external directory. In turn, this means that in Spark, a location is mandatory for external tables.
Convert a managed table to external You can easily convert a managed table if it is not an ACID (transactional) table to external using the ALTER TABLE statement. You might have a non-ACID, managed table after an upgrade from Hive 1 or 2.
In particular, data is usually saved in the Spark SQL warehouse directory - that is the default for managed tables - whereas metadata is saved in a meta-store of relational entities (including databases, tables, temporary views) and can be accessed through an interface known as the “catalog”.
As @AndyBrown pointed our in a comment you have the option of dropping to the console and invoking the Hive statement there. In Scala this worked for me:
import sys.process._
val exitCode = Seq("hive", "-e", "ALTER TABLE {table_name} SET TBLPROPERTIES(\"EXTERNAL\"=\"FALSE\")").!
I faced this problem using Spark 2.1.1 where @Joha's answer does not work because spark.sessionState is not accessible due to being declared lazy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With