Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Node streams cause large memory footprint or leak

I'm using node v0.12.7 and want to stream directly from a database to the client (for file download). However, I am noticing a large memory footprint (and possible memory leak) when using streams.

With express, I create an endpoint that simply pipes a readable stream to the response as follows:

app.post('/query/stream', function(req, res) {

  res.setHeader('Content-Type', 'application/octet-stream');
  res.setHeader('Content-Disposition', 'attachment; filename="blah.txt"');

  //...retrieve stream from somewhere...
  // stream is a readable stream in object mode

  stream
    .pipe(json_to_csv_transform_stream) // I've removed this and see the same behavior
    .pipe(res);
});

In production, the readable stream retrieves data from a database. The amount of data is quite large (1M+ rows). I swapped out this readable stream with a dummy stream (see code below) to simplify debugging and am noticing the same behavior: my memory usage jumps up by ~200M each time. Sometimes the garbage collection will kick in and the memory drops down a bit, but it linearly rises until my server runs out of memory.

The reason I started using streams was to not have to load large amounts of data into memory. Is this behavior expected?

I also notice that, while streaming, my CPU usage jumps to 100% and blocks (which means other requests can't be processed).

Am I using this incorrectly?

Dummy readable stream code

// Setup a custom readable
var Readable = require('stream').Readable;

function Counter(opt) {
  Readable.call(this, opt);
  this._max = 1000000; // Maximum number of records to generate
  this._index = 1;
}
require('util').inherits(Counter, Readable);

// Override internal read
// Send dummy objects until max is reached
Counter.prototype._read = function() {
  var i = this._index++;
  if (i > this._max) {
    this.push(null);
  }
  else {
    this.push({
      foo: i,
      bar: i * 10,
      hey: 'dfjasiooas' + i,
      dude: 'd9h9adn-09asd-09nas-0da' + i
    });
  }
};

// Create the readable stream
var counter = new Counter({objectMode: true});

//...return it to calling endpoint handler...

Update

Just a small update, I never found the cause. My initial solution was to use cluster to spawn off new processes so that other requests could still be handled.

I've since updated to node v4. While cpu/mem usage is still high during processing, it seems to have fixed the leak (meaning mem usage goes back down).

like image 426
lebolo Avatar asked Sep 18 '15 22:09

lebolo


People also ask

What causes memory leak Nodejs?

Closures, timers, and event handlers can often create memory leaks in Node. js applications. Let's look at a piece of code from the Meteor team explaining a closure that leads to a memory leak. It leads to a memory leak as the longStr variable is never collected and keeps growing in memory.

How do you detect Node memory leaks?

Finding the leak. Chrome DevTools is a great tool that can be used to diagnose memory leaks in Node. js applications via remote debugging. Other tools exist and they will give you the similar.

How much memory does Node use?

By default, Node. js (up to 11. x ) uses a maximum heap size of 700MB and 1400MB on 32-bit and 64-bit platforms, respectively.

Does setInterval cause memory leak?

Forgotten timers setTimeout and setInterval are the two timing events available in JavaScript. The setTimeout function executes when the given time is elapsed, whereas setInterval executes repeatedly for the given time interval. These timers are the most common cause of memory leaks.


2 Answers

Update 2: Here's a history of various Stream APIs:

https://medium.com/the-node-js-collection/a-brief-history-of-node-streams-pt-2-bcb6b1fd7468

0.12 uses Streams 3.

Update: This answer was true for old node.js streams. New Stream API has a mechanism to pause readable stream if writable stream can't keep up.

Backpressure

It looks like you've been hit by classic "backpressure" node.js problem. This article explains it in detail.

But here's a TL;DR:

You're right, streams are used to not have to load large amounts of data into memory.

But unfortunately streams don't have a mechanism to know if it's ok to continue streaming. Streams are dumb. They're just throwing data in the next stream as fast as they can.

In your example you're reading a large csv file and streaming it to the client. The thing is that the speed of reading a file is greater than the speed of uploading it through the network. So data needs to be stored somewhere until they can be successfully forgotten. That's why your memory keeps growing until the client finished downloading.

The solution is to throttle the reading stream to the speed of the slowest stream in the pipe. I.e. you prepend your reading stream with another stream which will tell your reading stream when is it ok to read the next chunk of data.

like image 148
Vanuan Avatar answered Oct 28 '22 23:10

Vanuan


It appears you are doing everything correctly. I copied your test case and am experiencing the same issue in v4.0.0. Taking it out of objectMode and using JSON.stringify on your object appeared to prevent both high memory and high cpu. That lead me to the built in JSON.stringify which appears to be the root of the problem. Using the streaming library JSONStream instead of the v8 method fixed this for me. It can be used like this: .pipe(JSONStream.stringify()).

like image 29
Cody Gustafson Avatar answered Oct 28 '22 23:10

Cody Gustafson