Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is this possible to lazily query the database with mongoengine (python)?

I have a memory issue with mongoengine (in python).

Let's say I have a very large amount of custom_documents (several thousands). I want to process them all, like this:

for item in custom_documents.objects():
    process(item)

The problem is custom_documents.objects() load every objects in memory and my app use several GB ...

How can I do to make it more memory wise? Is there a way to make mongoengine to query the DB lazily (it request objects when we iterates on the queryset)?

like image 677
Tewfik Avatar asked Feb 22 '23 01:02

Tewfik


2 Answers

According to the docs (and in my experience), collection.objects returns a lazy QuerySet. Your first problem might be that you're calling the objects attribute, rather than just using it as an iterable. I feel like there must be some other reason your app is using so much memory, perhaps process(object) stores a reference to it somehow? Try the following code and check your app's memory usage:

queryset = custom_documents.objects
print queryset.count()

Since QuerySets are lazy, you can do things like custom_documents.limit(100).skip(500) as well in order to return objects 500-600 only.

like image 114
jjm Avatar answered Feb 24 '23 19:02

jjm


I think you want to look at querysets - these are the MongoEngine wrapper for cursors:

http://mongoengine.org/docs/v0.4/apireference.html#querying

They let you control the number of objects returned, essentially taking care of the batch size settings etc. that you can set directly in the pymongo driver:

http://api.mongodb.org/python/current/api/pymongo/cursor.html

Cursors are set up to generally behave this way by default, you have to try to get them to return everything in one shot, even in the native mongodb shell.

like image 26
Adam Comerford Avatar answered Feb 24 '23 17:02

Adam Comerford