Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stream uploading file to S3 on Node.js using formidable and (knox or aws-sdk)

I'm trying to stream upload a file submitted via a form directly to an Amazon S3 bucket, using aws-sdk or knox. Form handling is done with formidable.

My question is: how do I properly use formidable with aws-sdk (or knox) using each of these libraries' latest features for handling streams?

I'm aware that this topic has already been asked here in different flavors, ie:

  • How to receive an uploaded file using node.js formidable library and save it to Amazon S3 using knox?
  • node application stream file upload directly to amazon s3
  • Accessing the raw file stream from a node-formidable file upload (and its very useful accepted answer on overiding form.onPart())

However, I believe the answers are a bit outdated and/or off topic (ie. CORS support, which I don't wish to use for now for various reasons) and/or, most importantly, make no reference to the latest features from either aws-sdk (see: https://github.com/aws/aws-sdk-js/issues/13#issuecomment-16085442) or knox (notably putStream() or its readableStream.pipe(req) variant, both explained in the doc).

After hours of struggling, I came to the conclusion that I needed some help (disclaimer: I'm quite a newbie with streams).

HTML form:

<form action="/uploadPicture" method="post" enctype="multipart/form-data">
  <input name="picture" type="file" accept="image/*">
  <input type="submit">
</form>

Express bodyParser middleware is configured this way:

app.use(express.bodyParser({defer: true}))

POST request handler:

uploadPicture = (req, res, next) ->
  form = new formidable.IncomingForm()
  form.parse(req)

  form.onPart = (part) ->
    if not part.filename
      # Let formidable handle all non-file parts (fields)
      form.handlePart(part)
    else
      handlePart(part, form.bytesExpected)

  handlePart = (part, fileSize) ->
    # aws-sdk version
    params =
      Bucket: "mybucket"
      Key: part.filename
      ContentLength: fileSize
      Body: part # passing stream object as body parameter

    awsS3client.putObject(params, (err, data) ->
      if err
        console.log err
      else
        console.log data
    )

However, I'm getting the following error:

{ [RequestTimeout: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.]

message: 'Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.', code: 'RequestTimeout', name: 'RequestTimeout', statusCode: 400, retryable: false }

A knox version of handlePart() function tailored this way also miserably fails:

handlePart = (part, fileSize) ->
  headers =
    "Content-Length": fileSize
    "Content-Type": part.mime
  knoxS3client.putStream(part, part.filename, headers, (err, res) ->
    if err
      console.log err
    else
      console.log res
  )      

I also get a big res object with a 400 statusCode somewhere.

Region is configured to eu-west-1 in both case.

Additional notes:

node 0.10.12

latest formidable from npm (1.0.14)

latest aws-sdk from npm (1.3.1)

latest knox from npm (0.8.3)

like image 201
jbmusso Avatar asked Jun 26 '13 00:06

jbmusso


People also ask

What is the best way for the application to upload the large files in S3?

When you upload large files to Amazon S3, it's a best practice to leverage multipart uploads. If you're using the AWS Command Line Interface (AWS CLI), then all high-level aws s3 commands automatically perform a multipart upload when the object is large. These high-level commands include aws s3 cp and aws s3 sync.

Can you stream files from S3?

You can stream the file from Amazon S3 to the client through your server without downloading the file to your server, by opening a stream to the Amazon S3 file then read from it and write on the client stream (buffer by buffer). You can use Stream.


1 Answers

Well, according to the creator of Formidable, direct streaming to Amazon S3 is impossible :

The S3 API requires you to provide the size of new files when creating them. This information is not available for multipart/form-data files until they have been fully received. This means streaming is impossible.

Indeed, form.bytesExpected refers to the size of the whole form, and not the size of the single file.

The data must therefore either hit the memory or the disk on the server first before being uploaded to S3.

like image 147
jbmusso Avatar answered Sep 18 '22 15:09

jbmusso