Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass multiple column in partitionby method in Spark

I am a newbie in Spark.I want to write the dataframe data into hive table. Hive table is partitioned on mutliple column. Through, Hivemetastore client I am getting the partition column and passing that as a variable in partitionby clause in write method of dataframe.

var1="country","state" (Getting the partiton column names of hive table)
dataframe1.write.partitionBy(s"$var1").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

When I am executing the above code,it is giving me error partiton "country","state" does not exists. I think it is taking "country","state" as a string.

Can you please help me out.

like image 919
Sumit D Avatar asked Jan 28 '23 16:01

Sumit D


1 Answers

The partitionBy function takes a varargs not a list. You can use this as

dataframe1.write.partitionBy("country","state").mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")

Or in scala you can convert a list into a varargs like

val columns = Seq("country","state")
dataframe1.write.partitionBy(columns:_*).mode("overwrite").save(s"$hive_warehouse/$dbname.db/$temp_table/")
like image 147
Avishek Bhattacharya Avatar answered Jun 06 '23 16:06

Avishek Bhattacharya