Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Safe+efficient way to modify Mongo objects while iterating over a cursor?

I've got some code that examines every object in a Mongo collection (iterating over the result of a find() with no parameters), and makes changes to some of them. It seems that this isn't a safe thing to do: my changes are saved, but then when I continue iterating through the cursor, a subset of the changed objects (10-15%) show up a second time. I wasn't changing the document ID or anything that there's an index on.

I figure I could avoid this problem by grabbing all the document IDs ahead of time (convert the cursor to an array), but these are large collections so I'd really like to avoid that.

I noticed that the result of find() by default doesn't seem to have any defined order, so I tried putting an explicit sort on the cursor, {"_id":1}. This seems to have fixed the problem-- now nothing shows up twice no matter what I modify. But I don't know if that's a good/reliable approach. As far as I can tell from the documentation, adding a sort does not make it pre-query all the IDs; if so, that's nice, but then I don't know why it would fix the problem.

Is it just a bad idea to use cursors while changing stuff?

I'm using Scala/Casbah, if that matters.

like image 238
Eli Bishop Avatar asked Apr 17 '12 21:04

Eli Bishop


2 Answers

It sounds like what you want is a snapshot query. Here's more info on how to do that:

http://www.mongodb.org/display/DOCS/How+to+do+Snapshotted+Queries+in+the+Mongo+Database

like image 179
mpobrien Avatar answered Oct 12 '22 21:10

mpobrien


Consider using an update command that modifies multiple documents: http://docs.mongodb.org/manual/tutorial/modify-documents/

Also, since you are only modifying some objects, consider using a query that only returns documents that you are actually going to modify rather than scanning the entire collection.

Iterating over the result of a find and modifying objects may seem more convenient and flexible, as you are not limited to what you can do with update operators, and you can write code in your language of choice to modify the document. However, there is the problem you described as well as other limitations:

http://docs.mongodb.org/manual/faq/developers/#faq-developers-isolate-cursors

For example, snapshot queries are not 100% safe, and they cannot be used with sharded collection, so if you later decide to shard, then your solution will break.

If you need to modify a very large number of objects in a more complicated way, maybe map-reduce or the aggregation pipeline can be a way to solve your problem:

http://docs.mongodb.org/manual/core/aggregation-pipeline/

http://docs.mongodb.org/manual/core/map-reduce/

like image 27
Florian Winter Avatar answered Oct 12 '22 22:10

Florian Winter