Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node reading file in specified chunk size

The goal: Upload large files to AWS Glacier without holding the whole file in memory.

I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last)

This thread suggests that I can set a chunk size on a read stream but that I'm not actually guaranteed to get it.

Any info on how I can get consistent parts without reading the whole file into memory and splitting it up manually?

Assuming I can get to that point I was just going to use cluster with a few processes pulling off the stream as fast as they can upload to AWS. If that seems like the wrong way to parallelize the work I'd love suggestions there.

like image 784
kjs3 Avatar asked Aug 04 '14 02:08

kjs3


People also ask

How do I read a node in chunks?

var fs = require('fs'); var data = ''; var readStream = fs. createReadStream('/tmp/foo. txt',{ highWaterMark: 1 * 1024, encoding: 'utf8' }); readStream. on('data', function(chunk) { data += chunk; console.

How do I read a large file in node?

The Node. js fs module has a method called pipe() that lets you write the result of a read stream directly into another file. All you need to do is initialize the read stream and write stream, then use the pipe() method to save the result of the read stream into the output file.

What is chunk in node JS?

A chunk is a fragment of the data that is sent by the client to server all chunks concepts to each other to make a buffer of the stream then the buffer is converted into meaningful data.

Which FS module method can be used to read the content of a file without buffering it in memory?

Which fs module method can be used to read the content of a file without buffering it in memory? Explanation: From official docs: reference To minimize memory costs, when possible prefer streaming via fs. createReadStream().


1 Answers

If nothing else you can just use fs.open(), fs.read(), and fs.close() manually. Example:

var CHUNK_SIZE = 10 * 1024 * 1024, // 10MB
    buffer = Buffer.alloc(CHUNK_SIZE),
    filePath = '/tmp/foo';

fs.open(filePath, 'r', function(err, fd) {
  if (err) throw err;
  function readNextChunk() {
    fs.read(fd, buffer, 0, CHUNK_SIZE, null, function(err, nread) {
      if (err) throw err;

      if (nread === 0) {
        // done reading file, do any necessary finalization steps

        fs.close(fd, function(err) {
          if (err) throw err;
        });
        return;
      }

      var data;
      if (nread < CHUNK_SIZE)
        data = buffer.slice(0, nread);
      else
        data = buffer;

      // do something with `data`, then call `readNextChunk();`
    });
  }
  readNextChunk();
});
like image 75
mscdex Avatar answered Nov 10 '22 20:11

mscdex