Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon AWS Athena S3 and Glacier Mixed Bucket

Amazon Athena Log Analysis Services with S3 Glacier

We have petabytes of data in S3. We are https://www.pubnub.com/ and we store usage data in S3 of our network for billing purposes. We have tab delimited log files stored in an S3 bucket. Athena is giving us a HIVE_CURSOR_ERROR failure.

Our S3 bucket is setup to automatically push to AWS Glacier after 6 months. Our bucket has S3 files hot and ready to read in addition to the Glacier backup files. We are getting access errors from Athena because of this. The file referenced in the error is a Glacier backup.

My guess is the answer will be: don't keep glacier backups in the same bucket. We don't have this option with ease due to our data volume sizes. I believe Athena will not work in this setup and we will not be able to use Athena for our log analysis.

However if there is a way we can use Athena, we would be thrilled. Is there a solution to HIVE_CURSOR_ERROR and a way to skip Glacier files? Our s3 bucket is a flat bucket without folders.

AWS Athena S3 Operation Exception

The S3 file object name shown in the above and below screenshots is omitted from the screenshot. The file reference in the HIVE_CURSOR_ERROR is in fact the Glacier object. You can see it in this screenshot of our S3 Bucket.

Amazon S3 Bucket object in Glacier accessed by Athena

Note I tried to post on https://forums.aws.amazon.com/ but that was no bueno.

enter image description here

like image 887
Stephen Blum Avatar asked Jan 25 '17 22:01

Stephen Blum


3 Answers

You must have an S3 bucket to work with. In addition, the AWS account that you use to initiate a S3 Glacier Select job must have write permissions for the S3 bucket. The Amazon S3 bucket must be in the same AWS Region as the vault that contains the archive object that is being queried.

S3 glacier select runs the query and stores in S3 bucket

Bottom line, you must move the data into an S3 buck to use teh S3 glacier select statement. Then use Athena on the 'new' S3 bucket.

like image 150
user14478563 Avatar answered Oct 17 '22 09:10

user14478563


The documentation from AWS dated May 16 2017 states specifically that Athena does not support the GLACIER storage class:

Athena does not support different storage classes within the bucket specified by the LOCATION clause, does not support the GLACIER storage class, and does not support Requester Pays buckets. For more information, see Storage Classes, Changing the Storage Class of an Object in |S3|, and Requester Pays Buckets in the Amazon Simple Storage Service Developer Guide.

We are also interested in this; if you get it to work, please let us know how. :-)

like image 7
user6405978 Avatar answered Oct 17 '22 11:10

user6405978


Since the release of February 18, 2019 Athena will ignore objects with the GLACIER storage class instead of failing the query:

[…] As a result of fixing this issue, Athena ignores objects transitioned to the GLACIER storage class. Athena does not support querying data from the GLACIER storage class.

like image 5
Theo Avatar answered Oct 17 '22 11:10

Theo