I want to create a hive table using my Spark dataframe's schema. How can I do that?
For fixed columns, I can use:
val CreateTable_query = "Create Table my table(a string, b string, c double)"
sparksession.sql(CreateTable_query)
But I have many columns in my dataframe, so is there a way to automatically generate such query?
Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Using HiveContext, you can create and find tables in the HiveMetaStore and write queries on it using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.
Assuming, you are using Spark 2.1.0 or later and my_DF is your dataframe,
//get the schema split as string with comma-separated field-datatype pairs
StructType my_schema = my_DF.schema();
String columns = Arrays.stream(my_schema.fields())
.map(field -> field.name()+" "+field.dataType().typeName())
.collect(Collectors.joining(","));
//drop the table if already created
spark.sql("drop table if exists my_table");
//create the table using the dataframe schema
spark.sql("create table my_table(" + columns + ")
row format delimited fields terminated by '|' location '/my/hdfs/location'");
//write the dataframe data to the hdfs location for the created Hive table
my_DF.write()
.format("com.databricks.spark.csv")
.option("delimiter","|")
.mode("overwrite")
.save("/my/hdfs/location");
The other method using temp table
my_DF.createOrReplaceTempView("my_temp_table");
spark.sql("drop table if exists my_table");
spark.sql("create table my_table as select * from my_temp_table");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With