Spark: Hive Query

Question

I have a log file, and the first column would be my partition in Hive table.

    logSchemaRDD.registerTempTable("logs")

    hiveContext.sql("insert overwrite table logs_parquet PARTITION(create_date=select ? from logs) select * from logs")

How do I construct the query to select the first column (marked as ? here) and ensure that the one I selected in partition matches the 2nd select (*)?

WestCoastProjects · Accepted Answer

You need to explicitly enumerate the columns in both the source and target list: in this case select * will not suffice.

insert overwrite table logs_parquet PARTITION(create_date) (col2, col3..) 
select col2,col3, .. col1 from logs

Yes it is more work to write the query - but partitioning queries do require the explicit mapping of the columns with the partitioning columns last.

Spark: Hive Query

Tags:

apache-spark

apache-spark-sql

hive

parquet

hiveql

sophie

1 Answers

WestCoastProjects

Recent Activity

Donate For Us

Spark: Hive Query

Tags:

apache-spark

apache-spark-sql

hive

parquet

hiveql

sophie

1 Answers

WestCoastProjects

Related questions

Recent Activity

Donate For Us