I have trouble figuring out how to insert data into a static partition of a Hive table using Spark SQL. I can use code like this to write into dynamic partitions:
df.write.partitionBy("key").insertInto("my_table")
However, I can't figure out how to insert the data into a static partition. That means, I want to define the partition where the entire DataFrame should be written without the need to add the column to the DataFrame.
I see static partitioning mentioned in the InsertIntoHiveTable class, so I guess it is supported. Is there a public API to do what I want?
You can use
DataFrame tableMeta = sqlContext.sql(String.format("DESCRIBE FORMATTED %s", tableName));
String location = tableMeta.filter("result LIKE 'Location:%'").first().getString(0);
and use regex to get your table partition. Once you get the table location, you can easily construct the partition location like
String partitionLocation = location + "/" + partitionKey
(partitionKey is something like dt=20160329/hr=21)
Then, you can write to that path
df.write.parquet(partitionLocation)
(in my case when I build the dataframe, I do not include the partition columns in. Not sure if there is any error when partition columns are included)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With