How to change column names of autodetected partitions created by Glue Crawler?

Tags:

I have bucket which is used as destination for a Kinesis Firehose stream.

Firehose automatically creates date-based prefixes on that bucket using the yyyy/mm/dd/HH format.

Then I created a crawler that will search for data into this bucket and configured it as follow:

Crawler configuration

After running the crawler, it creates a table with the following schema:

| #   | Column name   | Data type | Key           |
| --- | -----------   | --------- | ------------- |
| 1   | numberissues  | int       |               |
| 2   | group         | string    |               |
| 3   | createdat     | string    |               |
| 4   | companyunitid | string    |               |
| 5   | partition_0   | string    | Partition (0) |
| 6   | partition_1   | string    | Partition (1) |
| 7   | partition_2   | string    | Partition (2) |
| 8   | partition_3   | string    | Partition (3) |

If I rename the partition-* to their right counterparts year, month, day and hour, the table is ready for me to use.

However, if the crawler runs again, the schema revets the column names to the original partition-*.

I know this would work for Hive partition schemas year=2018/month=04..., but I want to know if it's possible to "hint" Glue about the partition field names.

Another alternative would be trying to change the Firehose prefixing, but I couldn't find anything that suggests this is even possible.

962

asked Apr 06 '18 19:04

Henrique Barcelos

1 Answers

In this case you can set the "Ignore the change and don't update the data catalog" option.

Then you can rename the columns. This will allow the crawler to detect new partitions on the next run but keep therenamed names.

188

answered Sep 19 '22 18:09

Ricardo Mayerhofer

Related questions
                            
                                How to extract month name on a string datatype on athena
                            
                                How to connect Superset with AWS athena?
                            
                                SerDe properties list for AWS Athena (JSON)
                            
                                AWS ATHENA: user-defined variables
                            
                                Connecting Athena and S3 in same Cloudformation Stack
                            
                                AWS Athena map query
                            
                                AWS Athena + S3 limitation
                            
                                AWS Athena Import CSV file
                            
                                Amazon Athena not parsing cloudfront logs
                            
                                How to use LISTAGG in AWS Athena?
                            
                                How can we create database and table in Amazon Athena using CloudFormation
                            
                                HIVE_PARTITION_SCHEMA_MISMATCH
                            
                                Use external table redshift spectrum defined in glue data catalog
                            
                                No viable alternative at input 'create external' while creating partition using athena
                            
                                Can you add more than one partition in one "ALTER TABLE" command?
                            
                                AWS Athena flattened data from nested JSON source
                            
                                How does AWS Athena react to schema changes in S3 files?
                            
                                use SQL inside AWS Glue pySpark script
                            
                                How do I Configure file format of AWS Athena results
                            
                                Since QuickSight can directly query S3, when would we need to use Athena as data source for QuickSight? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to change column names of autodetected partitions created by Glue Crawler?

Tags:

amazon-kinesis-firehose

amazon-athena

aws-glue

Henrique Barcelos

People also ask

1 Answers

Ricardo Mayerhofer

Recent Activity

Donate For Us