As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table schema)?
At the moment, when I run the crawler over this data and then make a query in Athena, I get the error 'HIVE_PARTITION_SCHEMA_MISMATCH'
My use case is:
If I were to manually write a schema I could do this fine as there would just be one table schema, and keys which are missing in the JSON file would be treated as Nulls.
Thanks in advance!
To get started, sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/ . Choose the Tables tab, and use the Add tables button to create tables either with a crawler or by manually typing attributes.
AWS Glue partition indexes are an important configuration to reduce overall data transfers and processing, and reduce query processing time. In the AWS Glue Data Catalog, the GetPartitions API is used to fetch the partitions in the table. The API returns partitions that match the expression provided in the request.
To create a table using the AWS Glue crawler. Open the Athena console at https://console.aws.amazon.com/athena/ . In the query editor, next to Tables and views, choose Create, and then choose AWS Glue crawler. Follow the steps on the Add crawler page of the AWS Glue console to add a crawler.
I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With