Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Download S3 Objects by List of Keys Using Boto3

I've got a list of keys that I'm retrieving from a cache, and I want to download the associated objects (files) from S3 without having to make a request per key.

Assuming I have the following array of keys:

key_array = [
    '20160901_0750_7c05da39_INCIDENT_MANIFEST.json',
    '20161207_230312_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_131407_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_145342_ZX1G222ZS3_INCIDENT_MANIFEST.json',
    '20161211_170600_FA68T0303607_INCIDENT_MANIFEST.json'
]

I'm trying to do something similar to this answer on another SO question, but modified like so:

import boto3

s3 = boto3.resource('s3')

incidents = s3.Bucket(my_incident_bucket).objects(key_array)

for incident in incidents:
    # Do fun stuff with the incident body
    incident_body = incident['Body'].read().decode('utf-8')

My ultimate goal being that I'd like to avoid hitting the AWS API separately for every key in the list. I'd also like to avoid having to pull the whole bucket down and filtering/iterating the full results.

like image 319
afilbert Avatar asked Mar 11 '23 05:03

afilbert


1 Answers

I think the best you are going to get is n API calls where n is the number of keys in your key_array. The amazon API for s3 doesn't offer much in the way of server-side filtering based on keys, other than prefixes. Here is the code to get it in n API calls:

import boto3
s3 = boto3.client('s3')

for key in key_array:
    incident_body = s3.get_object(Bucket="my_incident_bucket", Key=key)['Body']

    # Do fun stuff with the incident body
like image 98
Kevin Seaman Avatar answered Mar 20 '23 06:03

Kevin Seaman