Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Inserting data into a static Hive partition using Spark SQL

I have trouble figuring out how to insert data into a static partition of a Hive table using Spark SQL. I can use code like this to write into dynamic partitions:

df.write.partitionBy("key").insertInto("my_table")

However, I can't figure out how to insert the data into a static partition. That means, I want to define the partition where the entire DataFrame should be written without the need to add the column to the DataFrame.

I see static partitioning mentioned in the InsertIntoHiveTable class, so I guess it is supported. Is there a public API to do what I want?

like image 926
Lukáš Lalinský Avatar asked Nov 08 '22 11:11

Lukáš Lalinský


1 Answers

You can use

DataFrame tableMeta = sqlContext.sql(String.format("DESCRIBE FORMATTED %s", tableName));
String location = tableMeta.filter("result LIKE 'Location:%'").first().getString(0);

and use regex to get your table partition. Once you get the table location, you can easily construct the partition location like

String partitionLocation = location + "/" + partitionKey

(partitionKey is something like dt=20160329/hr=21)

Then, you can write to that path

df.write.parquet(partitionLocation)

(in my case when I build the dataframe, I do not include the partition columns in. Not sure if there is any error when partition columns are included)

like image 146
tpham Avatar answered Nov 15 '22 10:11

tpham