Today, I saw myself with a simple problem, renaming column of an Athena glue table from old to new name.
First thing, I search here and tried some solutions like this, this, and many others. Unfortunately, none works, so I decided to use my knowledge and imagination.
I'm posting this question with the intention of share, but also, with the intention to get how others did and maybe find out I reinvented the wheel. So please also share your way if you know how to do it.
My setup is, a Athena JSON table partitioned by day with valuable and enormous amount of data, the infrastructure is defined and updated through Cloudformation.
How to rename an Athena column and still keep the data?
Explaining without all the cloudformation infrastructure.
Imagine a table containing:
Partitioned by dt_utc and stored using JSON format. Wee need to change the column score to deltaScore.
Keep in mind, although I haven't tested with others format/configurations, this should apply to any configuration supported by athena as we are going to use athena algorithm to do the job for us.
if you run the cloudformation migration first, you gonna "lose" access to the dropped column.
but you can simply rename the column back and the data appears.
Those are the steps required for rename a AWS Athena table:
CREATE TABLE "temp_table_rename"
WITH(
format = 'JSON',
external_location = 's3://bucket_name/A_folder/temp_table_rename/',
partitioned_by = ARRAY['dt_utc']
)
AS
SELECT DISTINCT
userid,
score as deltascore,
otherColumns,
eventDateUtc,
"dt_utc"
FROM "my_database"."original_table"
Apply the database rename by running the cloudformation with the changes or on the way you have.
At this point, you can even drop the original_table, and create again using the right column name.
After rename, you will notice that the renamed column have no data.
Remove the data of the original table by deleting it's s3 source.
Copy the data from the temp table source to the original table source
I prefer to use a aws command as, there can be thousands of files to copy
aws s3 cp s3://bucket_name/A_folder/temp_table_rename/ s3://bucket_name/A_folder/original_table/ --recursive
MSCK REPAIR TABLE "my_database"."original_table"done.
Using CREATE TABLE AS to do the transformation job, allow you to do much more than only renaming the column, for example split the data of a column into 2 new columns, or merge it to a single one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With