I am trying to write an app that will allow my users to upload files to my Google Cloud Storage account. In order to prevent overwrites and to do some custom handling and logging on my side, I'm using a Node.js server as a middleman for the upload. So the process is:
I'm getting a little lost on step 3, of exactly how to send that file to GCS. This question gives some helpful insight, as well as a nice example, but I'm still confused.
I understand that I can open a ReadStream
for the temporary upload file and pipe that to the http.request()
object. What I'm confused about is how do I signify in my POST request that the piped data is the file
variable. According to the GCS API Docs, there needs to be a file
variable, and it needs to be the last one.
So, how do I specify a POST variable name for the piped data?
Bonus points if you can tell me how to pipe it directly from my user's upload, rather than storing it in a temporary file
I believe that if you want to do POST, you have to use a Content-Type: multipart/form-data;boundary=myboundary
header. And then, in the body, write()
something like this for each string field (linebreaks should be \r\n
):
--myboundary
Content-Disposition: form-data; name="field_name"
field_value
And then for the file itself, write()
something like this to the body:
--myboundary
Content-Disposition: form-data; name="file"; filename="urlencoded_filename.jpg"
Content-Type: image/jpeg
Content-Transfer-Encoding: binary
binary_file_data
The binary_file_data
is where you use pipe()
:
var fileStream = fs.createReadStream("path/to/my/file.jpg");
fileStream.pipe(requestToGoogle, {end: false});
fileStream.on('end, function() {
req.end("--myboundary--\r\n\r\n");
});
The {end: false}
prevents pipe()
from automatically closing the request because you need to write one more boundary after you're finished sending the file. Note the extra --
on the end of the boundary.
The big gotcha is that Google may require a content-length
header (very likely). If that is the case, then you cannot stream a POST from your user to a POST to Google because you won't reliably know what what the content-length
is until you've received the entire file.
The content-length
header's value should be a single number for the entire body. The simple way to do this is to call Buffer.byteLength(body)
on the entire body, but that gets ugly quickly if you have large files, and it also kills the streaming. An alternative would be to calculate it like so:
var body_before_file = "..."; // string fields + boundary and metadata for the file
var body_after_file = "--myboundary--\r\n\r\n";
var fs = require('fs');
fs.stat(local_path_to_file, function(err, file_info) {
var content_length = Buffer.byteLength(body_before_file) +
file_info.size +
Buffer.byteLength(body_after_file);
// create request to google, write content-length and other headers
// write() the body_before_file part,
// and then pipe the file and end the request like we did above
But, that still kills your ability to stream from the user to google, the file has to be downloaded to the local disk to determine it's length.
Alternate option
...now, after going through all of that, PUT might be your friend here. According to https://developers.google.com/storage/docs/reference-methods#putobject you can use a transfer-encoding: chunked
header so you don't need to find the files length. And, I believe that the entire body of the request is just the file, so you can use pipe()
and just let it end the request when it's done. If you're using https://github.com/felixge/node-formidable to handle uploads, then you can do something like this:
incomingForm.onPart = function(part) {
if (part.filename) {
var req = ... // create a PUT request to google and set the headers
part.pipe(req);
} else {
// let formidable handle all non-file parts
incomingForm.handlePart(part);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With