Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to quickly fetch all documents MongoDB pymongo

Currently I fetch documents by iterating through cursor in pymongo, for example:

for d in db.docs.find():
    mylist.append(d)

For reference, performing a fetchall on the same set of data (7m records) takes around 20 seconds while the method above takes a few minutes.

Is there a faster way read bulk data in mongo? Sorry I'm new to mongo, please let me know if more information is needed.

like image 305
Rusty Shackleford Avatar asked Jul 28 '16 22:07

Rusty Shackleford


People also ask

How do I get all files in collection PyMongo?

To get all the Documents of the Collection use find() method. The find() method takes a query object as a parameter if we want to find all documents then pass none in the find() method.

How do I fetch all documents in MongoDB?

Fetch all data from the collection If we want to fetch all documents from the collection the following mongodb command can be used : >db. userdetails. find(); or >db.

What is the PyMongo command to find all documents that match search criteria?

To find documents that match a set of selection criteria, call find() with the <criteria> parameter. MongoDB provides various query operators to specify the criteria.


2 Answers

using the $natural sort will bypass the index and return the documents in the order in which they are stored on disk, meaning that mongo doesn't have to thrash around with random reads on your disk.

https://docs.mongodb.com/manual/reference/method/cursor.sort/#return-natural-order

The performance becomes severely degraded if you want to use a query. You should never rely on FIFO ordering. Mongo allows itself to move documents around within it's storage layer. If you don't care about the order, so be it.

This ordering is an internal implementation feature, and you should not rely on any particular structure within i

for d in db.docs.find().sort( { $natural: 1 } ):
    mylist.append(d)

in python, you also want to use an EXHAUST cursor type that tells the mongo server to stream back the results without waiting for the pymongo driver to acknowledge each batch

https://api.mongodb.com/python/current/api/pymongo/cursor.html#pymongo.cursor.CursorType.EXHAUST

Mind you, it'll never be as fast as the shell. The slowest aspect of moving data between mongo/bson->pymongo->you is UTF8 string decoding within python.

like image 92
bauman.space Avatar answered Oct 21 '22 04:10

bauman.space


You only need to make a cast with list() function

pymongo_cursor = db.collection.find()
all_data = list(pymongo_cursor)
like image 20
Mateus Roberto Avatar answered Oct 21 '22 05:10

Mateus Roberto