I need to read whole collection from MongoDB ( collection name is "test" ) in Python code. I tried like
self.__connection__ = Connection('localhost',27017)
dbh = self.__connection__['test_db']
collection = dbh['test']
How to read through collection in chunks by 1000 ( to avoid memory overflow because collection can be very large ) ?
In MongoDB, you can access the fields of nested/embedded documents of the collection using dot notation and when you are using dot notation, then the field and the nested field must be inside the quotation marks. Document: three documents that contain the details of the students in the form of field-value pairs.
To specify a query condition on fields in an embedded/nested document, use dot notation ( "field. nestedField" ).
It is however very inefficient to retrieve documents one at a time from the server. Batch size is how many documents the driver requests from the server at once.
A chunk consists of a subset of sharded data. Each chunk has a inclusive lower and exclusive upper range based on the shard key. click to enlarge. MongoDB splits chunks when they grow beyond the configured chunk size. Both inserts and updates can trigger a chunk split.
I agree with Remon, but you mention batches of 1000, which his answer doesn't really cover. You can set a batch size on the cursor:
cursor.batch_size(1000);
You can also skip records, e.g.:
cursor.skip(4000);
Is this what you're looking for? This is effectively a pagination pattern. However, if you're just trying to avoid memory exhaustion then you don't really need to set batch size or skip.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With