Inserting data into a static Hive partition using Spark SQL

Question

I have trouble figuring out how to insert data into a static partition of a Hive table using Spark SQL. I can use code like this to write into dynamic partitions:

df.write.partitionBy("key").insertInto("my_table")

However, I can't figure out how to insert the data into a static partition. That means, I want to define the partition where the entire DataFrame should be written without the need to add the column to the DataFrame.

I see static partitioning mentioned in the InsertIntoHiveTable class, so I guess it is supported. Is there a public API to do what I want?

tpham · Accepted Answer

You can use

DataFrame tableMeta = sqlContext.sql(String.format("DESCRIBE FORMATTED %s", tableName));
String location = tableMeta.filter("result LIKE 'Location:%'").first().getString(0);

and use regex to get your table partition. Once you get the table location, you can easily construct the partition location like

String partitionLocation = location + "/" + partitionKey

(partitionKey is something like dt=20160329/hr=21)

Then, you can write to that path

df.write.parquet(partitionLocation)

(in my case when I build the dataframe, I do not include the partition columns in. Not sure if there is any error when partition columns are included)

Inserting data into a static Hive partition using Spark SQL

Tags:

apache-spark

hive

Lukáš Lalinský

1 Answers

tpham

Recent Activity

Donate For Us

Inserting data into a static Hive partition using Spark SQL

Tags:

apache-spark

hive

Lukáš Lalinský

1 Answers

tpham

Related questions

Recent Activity

Donate For Us