Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create hive table from Spark data frame, using its schema?

I want to create a hive table using my Spark dataframe's schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = "Create Table my table(a string, b string, c double)"
sparksession.sql(CreateTable_query) 

But I have many columns in my dataframe, so is there a way to automatically generate such query?

like image 618
lserlohn Avatar asked Feb 15 '17 22:02

lserlohn


People also ask

Can we create Hive table in Spark?

Hive comes bundled with the Spark library as HiveContext, which inherits from SQLContext. Using HiveContext, you can create and find tables in the HiveMetaStore and write queries on it using HiveQL. Users who do not have an existing Hive deployment can still create a HiveContext. When not configured by the hive-site.


1 Answers

Assuming, you are using Spark 2.1.0 or later and my_DF is your dataframe,

//get the schema split as string with comma-separated field-datatype pairs
StructType my_schema = my_DF.schema();
String columns = Arrays.stream(my_schema.fields())
                       .map(field -> field.name()+" "+field.dataType().typeName())
                       .collect(Collectors.joining(","));

//drop the table if already created
spark.sql("drop table if exists my_table");
//create the table using the dataframe schema
spark.sql("create table my_table(" + columns + ") 
    row format delimited fields terminated by '|' location '/my/hdfs/location'");
    //write the dataframe data to the hdfs location for the created Hive table
    my_DF.write()
    .format("com.databricks.spark.csv")
    .option("delimiter","|")
    .mode("overwrite")
    .save("/my/hdfs/location");

The other method using temp table

my_DF.createOrReplaceTempView("my_temp_table");
spark.sql("drop table if exists my_table");
spark.sql("create table my_table as select * from my_temp_table");
like image 90
somnathchakrabarti Avatar answered Sep 25 '22 13:09

somnathchakrabarti