Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

use cursor as iterator with a query

I was reading about mongodb. Came across this part http://www.mongodb.org/display/DOCS/Tutorial It says -

> var cursor = db.things.find();
> printjson(cursor[4]);
{ "_id" : ObjectId("4c220a42f3924d31102bd858"), "x" : 4, "j" : 3 }

"When using a cursor this way, note that all values up to the highest accessed (cursor[4] above) are loaded into RAM at the same time. This is inappropriate for large result sets, as you will run out of memory. Cursors should be used as an iterator with any query which returns a large number of elements."

How to use cursor as iterator with a query?Thanks for the help

like image 432
webminal.org Avatar asked Dec 02 '22 23:12

webminal.org


1 Answers

You've tagged that you're using pymongo, so I'll give you two pymongo examples using the cursor as an iterator:

import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
for item in cursor:
    print item
    #this will print the item as a dictionary

and

import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
results = [item['some_attribute'] for item in cursor]
#this will create a list comprehension containing the value of some_attribute
#for each item in the collection

In addition, you can set the size of batches returned to the pymongo driver by doing this:

import pymongo
cursor = pymongo.Connection().test_db.test_collection.find()
cursor.batchsize(20) #sets the size of batches of items the cursor will return to 20

It is usually unnecessary to mess with the batch size, but if the machine you are running the driver on is having memory issues and page faulting while you are manipulating results from the query, you might have to set this to achieve better performance (this really seems like a painful optimization to me and I've always left the default).

As far as the javascript driver (the driver that loads when you launch the "shell") that part of the documentation is cautioning you not to use "array mode". From the online manual:

Array Mode in the Shell

Note that in some languages, like JavaScript, the driver supports an "array mode". Please check your driver documentation for specifics.

In the db shell, to use the cursor in array mode, use array index [] operations and the length property.

Array mode will load all data into RAM up to the highest index requested. Thus it should not be used for any query which can return very large amounts of data: you will run out of memory on the client.

You may also call toArray() on a cursor. toArray() will load all objects queries into RAM.

like image 51
marr75 Avatar answered Dec 15 '22 00:12

marr75