Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I force a Mongoose Save() call to be synchronous

I am writing a script in Node.js which needs to do the following:

  1. Open XML file
  2. For each node in file
  3. Do a mongodb lookup to try find object relating to this node
  4. if object not found, create it, otherwise manipulate the found object in some manner
  5. save the (possibly new) object back to the database.
  6. goto step 2

I have looked at this for some time and come to the conclusion that it is almost impossible to do this with asynchronous mongodb. The problems are multiple, but for example if you are dealing with 20,000 of these nodes then doing it async will hang the script. However doing them as a batch insert isn't feasible either due to step 4 needing to look if the object already exists or not.

It would be possible to cobble something horrible together which caches the created objects and then saves them as something like step 7, except it would be difficult because there are multiple collections that the objects are going into, and you would need to try look up objects from the cache first, then the database, at step 4. If that is the solution then I will just write off Javascript as broken and write this in perl instead. So my question is this, for something so simple as the above sequence of actions, can I somehow force mongodb to be synchronous so that my script doesn't turn into insanity? I want to be able to say document.save() (I'm using Mongoose by the way) and then have it not return until after it has actually saved.

Edit: Added code

This is called from a loop roughly 20000 times. I don't care (within reason) how long it takes, but 200,000 async calls to save hangs the script so it can't be that (it also uses over 1.5gig of ram at that point). If I cannot make hObj.save(); wait until the object is actually saved then I am going to need to write this in a more capable language.

    models('hs').findOne({name: r2.$.name}, function (err, h) {
    if (err) {
        console.log(err);
    } else {
        var resultObj = createResult(meeting, r1, r2);

        if (h == undefined) {

            var hObj = new models('hs')({
                name : r2.$.name,
                results : [resultObj],
                numResults : 1
            });

            hObj.save();
        } else {
            h.results.push(resultObj);
            h.numResults++;
            h.save();
        }
    }
});
like image 417
user3690202 Avatar asked May 31 '14 10:05

user3690202


1 Answers

From the async github page:

eachSeries(arr, iterator, callback)

The same as each, only iterator is applied to each item in arr in series. The next iterator is only called once the current one has completed. This means the iterator functions will complete in order.

So assuming you have your XML nodes in nodes

async.eachSeries(
  nodes,
  // This will be applied to every node in nodes
  function (node, callback) {
    models('hs').findOne({name: r2.$.name}, function (err, h) {
      if (err) {
        console.log(err);
      } else {
        // Async?
        var resultObj = createResult(meeting, r1, r2);

        if (h == undefined) {

          var hObj = new models('hs')({
            name : r2.$.name,
            results : [resultObj],
            numResults : 1
          });

          hObj.save(function (err, p) {
            // Callback will tell async that you are done
            callback();
          });
        } else {
          h.results.push(resultObj);
          h.numResults++;
          h.save(function (err, p) {
            // Callback will tell async that you are done
            callback();
          });
        }
      }
    });
  },
  // This will be executed when all nodes has been processed
  function (err) {
    console.log('done!');
  }
);
like image 149
M_rivermount Avatar answered Nov 14 '22 13:11

M_rivermount