Node streams cause large memory footprint or leak

Tags:

I'm using node v0.12.7 and want to stream directly from a database to the client (for file download). However, I am noticing a large memory footprint (and possible memory leak) when using streams.

With express, I create an endpoint that simply pipes a readable stream to the response as follows:

app.post('/query/stream', function(req, res) {

  res.setHeader('Content-Type', 'application/octet-stream');
  res.setHeader('Content-Disposition', 'attachment; filename="blah.txt"');

  //...retrieve stream from somewhere...
  // stream is a readable stream in object mode

  stream
    .pipe(json_to_csv_transform_stream) // I've removed this and see the same behavior
    .pipe(res);
});

In production, the readable stream retrieves data from a database. The amount of data is quite large (1M+ rows). I swapped out this readable stream with a dummy stream (see code below) to simplify debugging and am noticing the same behavior: my memory usage jumps up by ~200M each time. Sometimes the garbage collection will kick in and the memory drops down a bit, but it linearly rises until my server runs out of memory.

The reason I started using streams was to not have to load large amounts of data into memory. Is this behavior expected?

I also notice that, while streaming, my CPU usage jumps to 100% and blocks (which means other requests can't be processed).

Am I using this incorrectly?

Dummy readable stream code

// Setup a custom readable
var Readable = require('stream').Readable;

function Counter(opt) {
  Readable.call(this, opt);
  this._max = 1000000; // Maximum number of records to generate
  this._index = 1;
}
require('util').inherits(Counter, Readable);

// Override internal read
// Send dummy objects until max is reached
Counter.prototype._read = function() {
  var i = this._index++;
  if (i > this._max) {
    this.push(null);
  }
  else {
    this.push({
      foo: i,
      bar: i * 10,
      hey: 'dfjasiooas' + i,
      dude: 'd9h9adn-09asd-09nas-0da' + i
    });
  }
};

// Create the readable stream
var counter = new Counter({objectMode: true});

//...return it to calling endpoint handler...

Update

Just a small update, I never found the cause. My initial solution was to use cluster to spawn off new processes so that other requests could still be handled.

I've since updated to node v4. While cpu/mem usage is still high during processing, it seems to have fixed the leak (meaning mem usage goes back down).

426

asked Sep 18 '15 22:09

lebolo

2 Answers

Update 2: Here's a history of various Stream APIs:

https://medium.com/the-node-js-collection/a-brief-history-of-node-streams-pt-2-bcb6b1fd7468

0.12 uses Streams 3.

Update: This answer was true for old node.js streams. New Stream API has a mechanism to pause readable stream if writable stream can't keep up.

Backpressure

It looks like you've been hit by classic "backpressure" node.js problem. This article explains it in detail.

But here's a TL;DR:

You're right, streams are used to not have to load large amounts of data into memory.

But unfortunately streams don't have a mechanism to know if it's ok to continue streaming. Streams are dumb. They're just throwing data in the next stream as fast as they can.

In your example you're reading a large csv file and streaming it to the client. The thing is that the speed of reading a file is greater than the speed of uploading it through the network. So data needs to be stored somewhere until they can be successfully forgotten. That's why your memory keeps growing until the client finished downloading.

The solution is to throttle the reading stream to the speed of the slowest stream in the pipe. I.e. you prepend your reading stream with another stream which will tell your reading stream when is it ok to read the next chunk of data.

148

answered Oct 28 '22 23:10

Vanuan

It appears you are doing everything correctly. I copied your test case and am experiencing the same issue in v4.0.0. Taking it out of objectMode and using JSON.stringify on your object appeared to prevent both high memory and high cpu. That lead me to the built in JSON.stringify which appears to be the root of the problem. Using the streaming library JSONStream instead of the v8 method fixed this for me. It can be used like this: .pipe(JSONStream.stringify()).

answered Oct 28 '22 23:10

Cody Gustafson

Related questions
                            
                                Detect "image corrupt or truncated" in Firefox
                            
                                what are the advantages of using an AMD like requirejs or commonjs modules in javascript?
                            
                                Can clearTimeout remove an unprocessed callback of a fired timeout event in Javascript?
                            
                                Does performance suffer if I attach .on('click' events to $("body")?
                            
                                How can I get my jasmine tests fixtures to load before the javascript considers the document to be "ready"?
                            
                                Is there a close event for the browser contextmenu
                            
                                getUserMedia - how to detect if the device actually has a camera
                            
                                What are the differences between these three patterns of "class" definitions in JavaScript?
                            
                                Why does twitter's share button add data-twttr-rendered="true" to my body tag?
                            
                                How do I play a .swf file using video.js?
                            
                                is there a canonical meteor.js forms package?
                            
                                Javascript read file without using input
                            
                                IE11 doesn't fire local storage events when value is too large
                            
                                Remove right and bottom margin on infowindows
                            
                                jasmine: check that an array contains an element with given properties
                            
                                How can I sync localStorage across Chrome instances (or use chrome.storage.sync without a published extension)?
                            
                                How to make sure remote @imports are processed correctly with Grunt
                            
                                how to change bootstrap datepicker month view to display quarters
                            
                                How to access outer function variable in nested inner function in JS
                            
                                What's the difference between live and not live collection in Javascript selectors?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Node streams cause large memory footprint or leak

Tags:

javascript

stream

node.js

memory-leaks

httpresponse