Selecting specific files for athena

Question

While creating a table in Athena, I am not able to create tables using specific files. Is there any way to select all the files starting with "year_2019" from a given bucket? For e.g. s3://bucketname/prefix/year_2019*.csv

The documentation is very clear about it and it is not allowed.

From: https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html

Athena reads all files in an Amazon S3 location you specify in the CREATE TABLE statement, and cannot ignore any files included in the prefix. When you create tables, include in the Amazon S3 path only the files you want Athena to read. Use AWS Lambda functions to scan files in the source location, remove any empty files, and move unneeded files to another location.

I will like to know if the community has found some work-around :)

Theo · Accepted Answer

Unfortunately the filesystem abstraction that Athena uses for S3 doesn't support this. It requires table locations to look like directories, and Athena will add a slash to the end of the location when listing files.

There is a way to create tables that contain only a selection of files, but as far as I know it does not support wildcards, only explicit lists of files.

What you do is you create a table with

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'

and then instead of pointing the LOCATION of the table to the actual files, you point it to a prefix with a single symlink.txt file (or point each partition to a prefix with a single symlink.txt). In the symlink.txt file you add the S3 URIs of the files to include in the table, one per line.

The only documentation that I know of for this feature is the S3 Inventory documentation for integrating with Athena.

You can also find a full example in this Stackoverflow response: https://stackoverflow.com/a/55069330/1109

Selecting specific files for athena

Tags:

amazon-athena

shantanuo

1 Answers

Theo

Recent Activity

Donate For Us

Selecting specific files for athena

Tags:

amazon-athena

shantanuo

1 Answers

Theo

Related questions

Recent Activity

Donate For Us