Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how best to 'tail -f' a large collection in mongo through meteor?

I have a collection in a mongo database that I append some logging-type of information. I'm trying to figure out the most efficient/simplest method to "tail -f" that in a meteor app - as a new document is added to the collection, it should be sent to the client, who should append it to the end of the current set of documents in the collection.

The client isn't going to be sent nor keep all of the documents in the collection, likely just the last ~100 or so.

Now, from a Mongo perspective, I don't see a way of saying "the last N documents in the collection" such that we wouldn't need to apply any sort at all. It seems like the best option available is doing natural sort descending, then a limit call, so something like what's listed in the mongo doc on $natural

db.collection.find().sort( { $natural: -1 } )

So, on the server side AFAICT the way of publishing this 'last 100 documents' Meteor collection would be something like:

Meteor.publish('logmessages', function () {
  return LogMessages.find({}, { sort: { $natural: -1 }, limit: 100 });
});

Now, from a 'tail -f' perspective, this seems to have the right effect of sending the 'last 100 documents' to the server, but does so in the wrong order (the newest document would be at the start of the Meteor collection instead of at the end).

On the client side, this seems to mean needing to (unfortunately) reverse the collection. Now, I don't see a reverse() in the Meteor Collection docs and sorting by $natural: 1 doesn't work on the client (which seems reasonable, since there's no real Mongo context). In some cases, the messages will have timestamps within the documents and the client could sort by that to get the 'natural order' back, but that seems kind of hacky.

In any case, it feels like I'm likely missing a much simpler way have a live 'last 100 documents inserted into the collection' collection published from mongo through meteor. :)

Thanks!

EDIT - looks like if I change the collection in Mongo to a capped collection, then the server could create a tailable cursor to efficiently (and quickly) get notified of new documents added to the collection. However, it's not clear to me if/how to get the server to do so through a Meteor collection.

An alternative that seems a little less efficient but doesn't require switching to a capped collection (AFAICT) is using Smart Collections which does tailing of the oplog so at least it's event-driven instead of polling, and since all the operations in the source collection will be inserts, it seems like it'd still be pretty efficient. Unfortunately, AFAICT I'm still left with the sorting issues since I don't see how to define the server side collection as 'last 100 documents inserted'. :(

If there is a way of creating a collection in Mongo as a query of another ("materialized view" of sorts), then maybe I could create a log-last-100 "collection view" in Mongo, and then Meteor would be able to just publish/subscribe the entire pseudo-collection?

like image 520
James Manning Avatar asked Oct 06 '13 22:10

James Manning


1 Answers

For insert-only data, $natural should get you the same results as indexing on timestamp and sorting so that's a good idea. The reverse thing is unfortunate; I think you have a couple choices:

  1. use $natural and do the reverse yourself
  2. add timestamp, still use $natural
  3. add timestamp, index by time, sort

'#1' - For 100 items, doing the reverse client-side should be no problem even for mobile devices and that will off-load it from the server. You can use .fetch() to convert to an array and then reverse it to maintain order without needing to use timestamps. You'll be playing in normal array-land though; no more nice mini-mongo features so do any filtering first before reversing.

'#2' - This one is interesting because you don't have to use an index but you can still use the timestamp on the client to sort the records. This gives you the benefit of staying in mini-mongo-land.

'#3' - Costs space on the db but its the most straight-forward

If you don't need the capabilities of mini-mongo (or are comfortable doing array filtering yourself) then #1 is probably best.

Unfortunately MongoDB doesn't have views so can't do your log-last-100 view idea (although that would be a nice feature).

Beyond the above, keep an eye on your subscription life-cycle so users don't continually pull down log updates in the background when not viewing the log. I could see that quickly becoming a performance killer.

like image 72
alanning Avatar answered Sep 29 '22 00:09

alanning