Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon Athena - How can I exclude the metadata when create table based on query result

In Athena, I want to create a table based on the query result, but every query result contains 2 files, ".csv" and ".csv.metadata". All these files are in my table and the metadata makes the table looks messy. Is there any way to ignore these ".csv.metadata" files, only show the data of ".csv" files?

Any suggestion or code snippets will be appreciated.

Thank you.

like image 229
Hilda Chang Avatar asked Mar 23 '18 07:03

Hilda Chang


3 Answers

You can exclude input files like this:

select * from your_table where "$PATH" not like '%metadata'
like image 160
Nicolas Busca Avatar answered Nov 14 '22 08:11

Nicolas Busca


Adding an underscore at the beginning of the filename will cause Athena to ignore the file. For example: _ignoredfile.csv.metadata

like image 41
Oren Avatar answered Nov 14 '22 08:11

Oren


It can't be done. From the documentation:

Athena reads all files in an Amazon S3 location you specify in the CREATE TABLE statement, and cannot ignore any files included in the prefix. When you create tables, include in the Amazon S3 path only the files you want Athena to read. Use AWS Lambda functions to scan files in the source location, remove any empty files, and move unneeded files to another location.

like image 2
Dan Hook Avatar answered Nov 14 '22 10:11

Dan Hook