I have a memory issue with mongoengine (in python).
Let's say I have a very large amount of custom_documents (several thousands). I want to process them all, like this:
for item in custom_documents.objects():
process(item)
The problem is custom_documents.objects()
load every objects in memory and my app use several GB ...
How can I do to make it more memory wise? Is there a way to make mongoengine to query the DB lazily (it request objects when we iterates on the queryset)?
According to the docs (and in my experience), collection.objects returns a lazy QuerySet
. Your first problem might be that you're calling the objects
attribute, rather than just using it as an iterable. I feel like there must be some other reason your app is using so much memory, perhaps process(object)
stores a reference to it somehow? Try the following code and check your app's memory usage:
queryset = custom_documents.objects
print queryset.count()
Since QuerySets
are lazy, you can do things like custom_documents.limit(100).skip(500)
as well in order to return objects 500-600 only.
I think you want to look at querysets - these are the MongoEngine wrapper for cursors:
http://mongoengine.org/docs/v0.4/apireference.html#querying
They let you control the number of objects returned, essentially taking care of the batch size settings etc. that you can set directly in the pymongo driver:
http://api.mongodb.org/python/current/api/pymongo/cursor.html
Cursors are set up to generally behave this way by default, you have to try to get them to return everything in one shot, even in the native mongodb shell.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With