I have a simple, single-client setup for MongoDB and PyMongo 2.6.3. The goal is to iterate over each document in the collection collection
and update (save
) each document in the process. The approach I'm using looks roughly like:
cursor = collection.find({})
index = 0
count = cursor.count()
while index != count:
doc = cursor[index]
print 'updating doc ' + doc['name']
# modify doc ..
collection.save(doc)
index += 1
cursor.close()
The problem is that save
is apparently modifying the order of documents in the cursor. For example, if my collection is made of 3 documents (id
s omitted for clarity):
{
"name": "one"
}
{
"name": "two"
}
{
"name": "three"
}
the above program outputs:
> updating doc one
> updating doc two
> updating doc two
If however, the line collection.save(doc)
is removed, the output becomes:
> updating doc one
> updating doc two
> updating doc three
Why is this happening? What is the right way to safely iterate and update documents in a collection?
I couldn't recreate your situation but maybe, off the top of my head, because fetching the results like you're doing it get's them one by one from the db, you're actually creating more as you go (saving and then fetching the next one).
You can try holding the result in a list (that way, your fetching all results at once - might be heavy, depending on your query):
cursor = collection.find({})
# index = 0
results = [res for res in cursor] #count = cursor.count()
cursor.close()
for res in results: # while index != count //This will iterate the list without you needed to keep a counter:
# doc = cursor[index] // No need for this since 'res' holds the current record in the loop cycle
print 'updating doc ' + res['name'] # print 'updating doc ' + doc['name']
# modify doc ..
collection.save(res)
# index += 1 // Again, no need for counter
Hope it helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With