I'm trying to stream upload a file submitted via a form directly to an Amazon S3 bucket, using aws-sdk or knox. Form handling is done with formidable.
My question is: how do I properly use formidable with aws-sdk (or knox) using each of these libraries' latest features for handling streams?
I'm aware that this topic has already been asked here in different flavors, ie:
However, I believe the answers are a bit outdated and/or off topic (ie. CORS support, which I don't wish to use for now for various reasons) and/or, most importantly, make no reference to the latest features from either aws-sdk (see: https://github.com/aws/aws-sdk-js/issues/13#issuecomment-16085442) or knox (notably putStream() or its readableStream.pipe(req) variant, both explained in the doc).
After hours of struggling, I came to the conclusion that I needed some help (disclaimer: I'm quite a newbie with streams).
HTML form:
<form action="/uploadPicture" method="post" enctype="multipart/form-data">
<input name="picture" type="file" accept="image/*">
<input type="submit">
</form>
Express bodyParser middleware is configured this way:
app.use(express.bodyParser({defer: true}))
POST request handler:
uploadPicture = (req, res, next) ->
form = new formidable.IncomingForm()
form.parse(req)
form.onPart = (part) ->
if not part.filename
# Let formidable handle all non-file parts (fields)
form.handlePart(part)
else
handlePart(part, form.bytesExpected)
handlePart = (part, fileSize) ->
# aws-sdk version
params =
Bucket: "mybucket"
Key: part.filename
ContentLength: fileSize
Body: part # passing stream object as body parameter
awsS3client.putObject(params, (err, data) ->
if err
console.log err
else
console.log data
)
However, I'm getting the following error:
{ [RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.]
message: 'Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.', code: 'RequestTimeout', name: 'RequestTimeout', statusCode: 400, retryable: false }
A knox version of handlePart() function tailored this way also miserably fails:
handlePart = (part, fileSize) ->
headers =
"Content-Length": fileSize
"Content-Type": part.mime
knoxS3client.putStream(part, part.filename, headers, (err, res) ->
if err
console.log err
else
console.log res
)
I also get a big res object with a 400 statusCode somewhere.
Region is configured to eu-west-1 in both case.
Additional notes:
node 0.10.12
latest formidable from npm (1.0.14)
latest aws-sdk from npm (1.3.1)
latest knox from npm (0.8.3)
When you upload large files to Amazon S3, it's a best practice to leverage multipart uploads. If you're using the AWS Command Line Interface (AWS CLI), then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.
You can stream the file from Amazon S3 to the client through your server without downloading the file to your server, by opening a stream to the Amazon S3 file then read from it and write on the client stream (buffer by buffer). You can use Stream.
Well, according to the creator of Formidable, direct streaming to Amazon S3 is impossible :
The S3 API requires you to provide the size of new files when creating them. This information is not available for multipart/form-data files until they have been fully received. This means streaming is impossible.
Indeed, form.bytesExpected refers to the size of the whole form, and not the size of the single file.
The data must therefore either hit the memory or the disk on the server first before being uploaded to S3.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With