Read .pptx file from s3

Question

I try to open a .pptx from Amazon S3 and read it using the python-pptx library. This is the code:

from pptx import Presentation
import boto3
s3 = boto3.resource('s3')

obj=s3.Object('bucket','key')
body = obj.get()['Body']
prs=Presentation((body))

It gives "AttributeError: 'StreamingBody' object has no attribute 'seek'". Shouldn't this work? How can I fix this? I also tried using read() on body first. Is there a solution without actually downloading the file?

bcosta12 · Accepted Answer

To load files from S3 you should download (or use stream strategy) and use io.BytesIO to transform your data as pptx.Presentation can handle.

import io
import boto3

from pptx import Presentation

s3 = boto3.client('s3')
s3_response_object = s3.get_object(Bucket='bucket', Key='file.pptx')
object_content = s3_response_object['Body'].read()

prs = Presentation(io.BytesIO(object_content))

ref:

Just like what we do with variables, data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations. journaldev

Read .pptx file from s3

Tags:

python

amazon-s3

boto3

python-pptx

piratenking

1 Answers

bcosta12

Recent Activity

Donate For Us

Read .pptx file from s3

Tags:

python

amazon-s3

boto3

python-pptx

piratenking

1 Answers

bcosta12

Related questions

Recent Activity

Donate For Us