Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Spark, does CREATE TABLE command create an external table?

Based on the following thread in GitHub (https://github.com/databricks/spark-csv/issues/45) I understand that CREATE TABLE + Options (like JDBC), will create a Hive external table?. These type of tables don't materialize themselves and hence no data is lost when the table is dropped vial SQL or removed from Databricks Tables UI.

like image 656
jmdev Avatar asked Nov 25 '25 22:11

jmdev


2 Answers

You can very well create an EXTERNAL table in spark, but you have to take care of using HiveContext instead of SqlContext:

scala> import org.apache.spark.sql.hive._
import org.apache.spark.sql.hive._

scala> val hc = new HiveContext(sc)
hc: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@385ff04e

scala> hc.sql("create external table blah ( name string ) location 'hdfs:///tmp/blah'")
res0: org.apache.spark.sql.DataFrame = [result: string]
like image 83
Roberto Congiu Avatar answered Nov 28 '25 15:11

Roberto Congiu


From Spark 2.0 docs: https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#hive-tables

In Spark SQL : CREATE TABLE ... LOCATION is equivalent to CREATE EXTERNAL TABLE ... LOCATION in order to prevent accidental dropping the existing data in the user-provided locations. That means, a Hive table created in Spark SQL with the user-specified location is always a Hive external table. Dropping external tables will not remove the data. Users are not allowed to specify the location for Hive managed tables. Note that this is different from the Hive behavior.

like image 21
Paul Bendevis Avatar answered Nov 28 '25 15:11

Paul Bendevis



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!