Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can we convert an external table to managed table in SPARK 2.2.0?

Tags:

apache-spark

The below command was successfully converting external tables to managed tables in Spark 2.0.0:

ALTER TABLE {table_name} SET TBLPROPERTIES(EXTERNAL=FLASE);

However the above command is failing in Spark 2.2.0 with the below error:

Error in query: Cannot set or change the preserved property key: 'EXTERNAL';

like image 667
oushnik dey Avatar asked Jan 04 '18 07:01

oushnik dey


People also ask

What is external table in spark?

The External Tables feature lets you read from and write to data sources stored outside of Vector. The data source must be one that Apache Spark is able to read from and write to, such as HDFS files stored in formats like Parquet, ORC, JSON, or tables in external database systems.

Can we create table using spark SQL?

Spark allows you to create two types of tables: managed and unmanaged. For a managed table, Spark manages both the metadata and the data in the file store.

How to convert locally managed table to external table in spark?

Use case: There is lot of data in the locally managed table and we want to convert those table into external table because we are working on a use case where our spark and home grown application has trouble reading locally managed tables. Solution: alter table table_name SET TBLPROPERTIES ('EXTERNAL'='TRUE');

What are the advantages of unmanaged tables in spark?

Unmanaged tables provide much more flexibility, as the table data can be stored in a location of your choice, or the table can be built directly on top of data available in an external directory. In turn, this means that in Spark, a location is mandatory for external tables.

How do I convert a managed table to an external table?

Convert a managed table to external You can easily convert a managed table if it is not an ACID (transactional) table to external using the ALTER TABLE statement. You might have a non-ACID, managed table after an upgrade from Hive 1 or 2.

What is the difference between data and metadata in spark?

In particular, data is usually saved in the Spark SQL warehouse directory - that is the default for managed tables - whereas metadata is saved in a meta-store of relational entities (including databases, tables, temporary views) and can be accessed through an interface known as the “catalog”.


1 Answers

As @AndyBrown pointed our in a comment you have the option of dropping to the console and invoking the Hive statement there. In Scala this worked for me:

import sys.process._
val exitCode = Seq("hive", "-e", "ALTER TABLE {table_name} SET TBLPROPERTIES(\"EXTERNAL\"=\"FALSE\")").!

I faced this problem using Spark 2.1.1 where @Joha's answer does not work because spark.sessionState is not accessible due to being declared lazy.

like image 67
Sebastian Hätälä Avatar answered Sep 19 '22 03:09

Sebastian Hätälä