For example I have few Hive HQL statements which I want to pass into Spark SQL:
set parquet.compression=SNAPPY;
create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE;
select * from MY_TABLE limit 5;
Following doesn't work:
hiveContext.sql("set parquet.compression=SNAPPY; create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE; select * from MY_TABLE limit 5;")
How to pass the statements into Spark SQL?
I worked on a scenario where i needed to read a sql file and run all the; separated queries present in that file.
One simple way to do it is like this:
val hsc = new org.apache.spark.sql.hive.HiveContext(sc)
val sql_file = "/hdfs/path/to/file.sql"
val file = sc.wholeTextFiles(s"$sql_file")
val queries = f.take(1)(0)._2
Predef.refArrayOps(queries.split(';')).map(query => hsc.sql(query))
Thank you to @SamsonScharfrichter for the answer.
This will work:
hiveContext.sql("set spark.sql.parquet.compression.codec=SNAPPY")
hiveContext.sql("create table MY_TABLE stored as parquet as select * from ANOTHER_TABLE")
val rs = hiveContext.sql("select * from MY_TABLE limit 5")
Please note that in this particular case instead of parquet.compression key we need to use spark.sql.parquet.compression.codec
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With