Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the best way to iterate or recurse through huge amounts of huge functions without exceeding the stack limit?

I have an application that I'm writing in Node.js which needs to make a lot of configuration and database calls in order to process user data. The issue I'm having is that after 11,800+ function calls Node will throw an error and exit the process.

The error says: RangeError: Maximum call stack size exceeded

I'm curious if anyone else has had this situation arise and to know how they handled this. I've already started to break up my code into a couple of extra worker files but even so each time I process a data node it needs to touch 2 databases (at most 25 calls to update various tables) and do a number of sanitization checks.

I am totally willing to admit that I'm possibly doing something non-optimal if that is the case but would appreciate some guidance if there is a more optimal manner.

Here is an example of the code I'm running on data:

app.post('/initspeaker', function(req, res) {
    // if the Admin ID is not present ignore
    if(req.body.xyzid!=config.adminid) {
        res.send( {} );
        return;
    }

    var gcnt = 0, dbsize = 0, goutput = [], goutputdata = [], xyzuserdataCallers = [];

    xyz.loadbatchfile( xyz.getbatchurl("speakers", "csv"), function(data) {
        var parsed = csv.parse(data);
        console.log("lexicon", parsed[0]);

        for(var i=1;i<parsed.length;i++) {
            if(typeof parsed[i][0] != 'undefined' && parsed[i][0]!='name') {
                var xyzevent = require('./lib/model/xyz_speaker').create(parsed[i], parsed[0]);
                xyzevent.isPresenter = true;
                goutput.push(xyzevent);
            }
        }
        dbsize = goutput.length;

        xyzuserdataCallers = [new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata(),
                                    new xyzuserdata()
                                ];
        // insert all Scheduled Items into the DB                   
        xyzuserdataCallers[0].sendSpeakerData(goutput[0]);
        for(var i=1;i<xyzuserdataCallers;i++) {
            xyzuserdataCallers[i].sendSpeakerData(8008);
        }

        //sendSpeakerData(goutput[0]);
    });

    var callback = function(data, func) {
        //console.log(data);
        if(data && data!=8008) {
            if(gcnt>=dbsize) {
                res.send("done");
            } else {
                gcnt++;
                func.sendSpeakerData(goutput[gcnt]);
            }
        } else {
            gcnt++;
            func.sendSpeakerData(goutput[gcnt]);
        }
    };

    // callback loop for fetching registrants for events from SMW
    var xyzuserdata = function() {};
    xyzuserdata.prototype.sendSpeakerData = function(data) {
        var thisfunc = this;

        if(data && data!=8008) {
            //console.log('creating user from data', gcnt, dbsize);
            var userdata = require('./lib/model/user').create(data.toObject());
            var speakerdata = userdata.toObject();
            speakerdata.uid = uuid.v1();
            speakerdata.isPresenter = true;

            couchdb.insert(speakerdata, config.couch.db.user, function($data) {
                if($data==false) {
                    // if this fails it is probably due to a UID colliding
                    console.log("*** trying user data again ***");
                    speakerdata.uid = uuid.v1();
                    arguments.callee( speakerdata );
                } else {
                    callback($data, thisfunc);
                }
            });
        } else {
            gcnt++;
            arguments.callee(goutput[gcnt]);
        }
    };

});

A couple of classes and items are defined here that need some introduction:

  • I am using Express.js + hosted CouchDB and this is responding to a POST request
  • There is a CSV parser class that loads a list of events which drives pulling speaker data
  • Each event can have n number of users (currently around 8K users for all events)
  • I'm using a pattern that loads all of the data/users before attempting to parse any of them
  • Each user loaded (external data source) is converted into an object I can use and also sanitized (strip slashes and such)
  • Each user is then inserted into CouchDB

This code works in the app but after a while I get an error saying that over 11,800+ calls have been made and the app breaks. This isn't an error that contains a stack trace like one would see if it was code error, it is exiting due to the number of calls being done.

Again, any assistance/commentary/direction would be appreciated.

like image 517
Liam Avatar asked Feb 01 '12 17:02

Liam


1 Answers

It looks like xyzuserdata.sendSpeakerData & callback are being used recursively in order to keep the DB calls sequential. At some point you run out of call stack...

There's several modules to make serial execution easier, like Step or Flow-JS.

Flow-JS even has a convenience function to apply a function serially over the elements of the array:

flow.serialForEach(goutput, xyzuserdata.sendSpeakerData, ...)

I wrote a small test program using flow.serialForEach, but unfortunately was able to get a Maximum call stack size exceeded error -- Looks like Flow-JS is using the call stack in a similar way to keep things in sync.

Another approach that doesn't build up the call stack is to avoid recursion and use setTimeout with a timeout value of 0 to schedule the callback call. See http://metaduck.com/post/2675027550/asynchronous-iteration-patterns-in-node-js

You could try replacing the callback call with

setTimeout(callback, 0, [$data, thisfunc])
like image 63
mike Avatar answered Oct 16 '22 08:10

mike