I want to pipe large video files from AWS S3 into Popen
's stdin
, which is from Python's point of view a 'file-like object'. This code runs as an AWS Lambda function, so these files won't fit in memory or on the local file system. Also, I don't want to copy these huge files anywhere, I just want to stream the input, process on the fly, and stream the output. I've already got the processing and streaming output bits working. The problem is how to obtain an input stream as a Popen pipe
.
Update: I put together a short program that invokes StreamingBody.read(amt=chunk_size) based on a comment. The program reads some of the input file (an mp4 video) and gets stuck, possibly because the consumer of the data (ffmpeg) does not actually run, or maybe its STDIN buffer fills and the whole mess grinds to a halt?
I can access a file in an S3 bucket:
import boto3
s3 = boto3.resource('s3')
response = s3.Object(bucket_name=bucket, key=key).get()
body = response['Body']
body
is a botocore.response.StreamingBody
which looks like this:
{
u'Body': <botocore.response.StreamingBody object at 0x00000000042EDAC8>,
u'AcceptRanges': 'bytes',
u'ContentType': 'video/mp4',
'ResponseMetadata': {
'HTTPStatusCode': 200,
'HostId': 'aAUs3IdkXP6vPGwauv6/USEBUWfxxVeueNnQVAm4odTkPABKUx1EbZO/iLcrBWb+ZiyqmQln4XU=',
'RequestId': '6B306488F6DFEEE9'
},
u'LastModified': datetime.datetime(2015, 3, 1, 1, 32, 58, tzinfo=tzutc()),
u'ContentLength': 393476644,
u'ETag': '"71079d637e9f14a152170efdf73df679"',
u'Metadata': {'cb-modifiedtime': 'Sun, 01 Mar 2015 01:27:52 GMT'}}
I intend to use body
something like this:
from subprocess import Popen, PIPE
Popen(cmd, stdin=PIPE, stdout=PIPE).communicate(input=body)[0]
But of course body
needs to be converted into a file-like object. The question is how?
For reading binary data from StreamingBody use StreamBody.read()
. You get a binary string.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With