What happens when after creating the table in AWS Athena for files on S3, the structure of the files on S3 change?
For eg:
If the files previously had 5 columns when the table was created and later the new files started getting 1 more column: a) at the end? b) in between?
What happens when some columns are not available in new files?
What happens when the columns remain the same but the column order changes?
Can we alter Athena tables to adjust to these changes?
1 - Athena is not a NoSQL solution. It is not dynamic schema either. If you change the schema, all your files in a particular folder should reflect that change. Athena wont magically update to have it included.
2 - Then it'll be a problem and it'll break. You should include NULL
or ,,
to force it to be okay.
3 - Athena picks it up by column order. Not by name, really. If your column orders change, it'll probably break (different types).
4 - Yes. You can always easily recreate Athena tables by dropping it and creating a new one.
If you have variable length files, then you should insert them into different folders so that each folder represents one consistent schema. You can then unify this later on in Athena with a union
or similar to create a condensed, simplified table that you can apply the consistent schema to.
It depends on the files format you are using and the setup (if the schema is by field order or by field name). All the details are here: https://docs.aws.amazon.com/athena/latest/ug/handling-schema-updates-chapter.html Take a big note that if the data is nested or in arrays, it will completely break your data, to quote from this page:
Schema updates described in this section do not work on tables with complex or nested data types, such as arrays and structs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With