Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create AWS Glue table where partitions have different columns? ('HIVE_PARTITION_SCHEMA_MISMATCH')

As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets of columns from the table schema)?

At the moment, when I run the crawler over this data and then make a query in Athena, I get the error 'HIVE_PARTITION_SCHEMA_MISMATCH'

My use case is:

  • Partitions represent days
  • Files represent events
  • Each event is a json blob in a single s3 file
  • An event contains a subset of columns (dependent on the type of event)
  • The 'schema' of the entire table is the full set of columns for all the event types (this is correctly put together by Glue crawler)
  • The 'schema' of each partition is the subset of columns for the event types that occurred on that day (hence in Glue each partition potentially has a different subset of columns from the table schema)
  • This inconsistency causes the error in Athena I think

If I were to manually write a schema I could do this fine as there would just be one table schema, and keys which are missing in the JSON file would be treated as Nulls.

Thanks in advance!

like image 948
rjmurt Avatar asked Sep 15 '17 13:09

rjmurt


People also ask

How do you create a table in AWS Glue?

To get started, sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/ . Choose the Tables tab, and use the Add tables button to create tables either with a crawler or by manually typing attributes.

What is partitioning in AWS Glue?

AWS Glue partition indexes are an important configuration to reduce overall data transfers and processing, and reduce query processing time. In the AWS Glue Data Catalog, the GetPartitions API is used to fetch the partitions in the table. The API returns partitions that match the expression provided in the request.

How do you make a table using Athena with AWS Glue?

To create a table using the AWS Glue crawler. Open the Athena console at https://console.aws.amazon.com/athena/ . In the query editor, next to Tables and views, choose Create, and then choose AWS Glue crawler. Follow the steps on the Add crawler page of the AWS Glue console to add a crawler.


1 Answers

I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:

enter image description here

like image 100
Mario Filipović Avatar answered Sep 24 '22 06:09

Mario Filipović