Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

create hive external table with schema in spark

I am using spark 1.6 and I aim to create external hive table like what I do in hive script. To do this, I first read in the partitioned avro file and get the schema of this file. Now I stopped here, I get no idea how to apply this schema to my creating table. I use scala. Need help guys.

like image 368
G_cy Avatar asked Jul 27 '16 16:07

G_cy


2 Answers

finally, I make it myself with old-fashioned way. With the help of code below:

val rawSchema = sqlContext.read.avro("Path").schema
val schemaString = rawSchema.fields.map(field => field.name.replaceAll("""^_""", "").concat(" ").concat(field.dataType.typeName match {
        case "integer" => "int"
        case smt => smt
      })).mkString(",\n")

      val ddl =
      s"""
         |Create external table $tablename ($schemaString) \n
         |partitioned by (y int, m int, d int, hh int, mm int) \n
         |Stored As Avro \n
         |-- inputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' \n
         | -- outputformat 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' \n
         | Location 'hdfs://$path'
       """.stripMargin

take care no column name can start with _ and hive can't parse integer. I would like to say that this way is not flexible but work. if anyone get better idea, plz comment.

like image 93
G_cy Avatar answered Nov 13 '22 05:11

G_cy


I didn't see a way to automatically infer schema for external tables. So I created case for the string type. You could add case for your data type. But I'm not sure how many columns you have. I apologize as this might not be a clean approach.

import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{Row, SaveMode};
import org.apache.spark.sql.types.{StructType,StructField,StringType};

val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
val results = hiveContext.read.format("com.databricks.spark.avro").load("people.avro")


val schema = results.schema.map( x => x.name.concat(" ").concat( x.dataType.toString() match { case "StringType" => "STRING"} ) ).mkString(",")

val hive_sql = "CREATE EXTERNAL TABLE people_and_age (" + schema + ")                  ROW FORMAT DELIMITED                 FIELDS TERMINATED BY ','                LOCATION '/user/ravi/people_age'"

hiveContext.sql(hive_sql)
results.saveAsTable("people_age",SaveMode.Overwrite)
hiveContext.sql("select * from people_age").show()
like image 38
Ravi Avatar answered Nov 13 '22 03:11

Ravi