Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Spark: Hive Query

I have a log file, and the first column would be my partition in Hive table.

    logSchemaRDD.registerTempTable("logs")

    hiveContext.sql("insert overwrite table logs_parquet PARTITION(create_date=select ? from logs) select * from logs")

How do I construct the query to select the first column (marked as ? here) and ensure that the one I selected in partition matches the 2nd select (*)?

like image 671
sophie Avatar asked Apr 14 '26 13:04

sophie


1 Answers

You need to explicitly enumerate the columns in both the source and target list: in this case select * will not suffice.

insert overwrite table logs_parquet PARTITION(create_date) (col2, col3..) 
select col2,col3, .. col1 from logs

Yes it is more work to write the query - but partitioning queries do require the explicit mapping of the columns with the partitioning columns last.

like image 116
WestCoastProjects Avatar answered Apr 17 '26 09:04

WestCoastProjects



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!