I have a log file, and the first column would be my partition in Hive table.
logSchemaRDD.registerTempTable("logs")
hiveContext.sql("insert overwrite table logs_parquet PARTITION(create_date=select ? from logs) select * from logs")
How do I construct the query to select the first column (marked as ? here) and ensure that the one I selected in partition matches the 2nd select (*)?
You need to explicitly enumerate the columns in both the source and target list: in this case select * will not suffice.
insert overwrite table logs_parquet PARTITION(create_date) (col2, col3..)
select col2,col3, .. col1 from logs
Yes it is more work to write the query - but partitioning queries do require the explicit mapping of the columns with the partitioning columns last.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With