Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pymongo: iterate over all documents in the collection

I am using PyMongo and trying to iterate over (10 millions) documents in my MongoDB collection and just extract a couple of keys: "name" and "address", then output them to .csv file.

I cannot figure out the right syntax to do it with find().forEach()

I was trying workarounds like

cursor = db.myCollection.find({"name": {$regex: REGEX}})

where REGEX would match everything - and it resulted in "Killed". I also tried

cursor = db.myCollection.find({"name": {"$exist": True}})

but that did not work either.

Any suggestions?

like image 697
mel Avatar asked Nov 30 '16 22:11

mel


People also ask

How do I get all the documents in a collection PyMongo?

To get all the Documents of the Collection use find() method. The find() method takes a query object as a parameter if we want to find all documents then pass none in the find() method.

How do I update all documents in PyMongo?

Updating all Documents in a Collection. PyMongo includes an update_many() function which updates all the documents which satisfy the given query. filter – It is the first parameter which is a criteria according to which the documents that satisfy the query are updated.

What is the PyMongo command to find all documents that match search criteria?

To find documents that match a set of selection criteria, call find() with the <criteria> parameter. MongoDB provides various query operators to specify the criteria.


2 Answers

I cannot figure out the right syntax to do it with find().forEach()

cursor.forEach() is not available for Python, it's a JavaScript function. You would have to get a cursor and iterate over it. See PyMongo Tutorial: querying for more than one document, where you can do :

for document in myCollection.find():
    print(document) # iterate the cursor

where REGEX would match everything - and it resulted in "Killed".

Unfortunately there's lack of information here to debug on why and what 'Killed' is. Although if you would like to match everything, you can just state:

cursor = db.myCollection.find({"name": {$regex: /.*/}}) 

Given that field name contains string values. Although using $exists to check whether field name exists would be preferable than using regex.

While the use of $exists operator in your example above is incorrect. You're missing an s in $exists. Again, unfortunately we don't know much information on what 'didn't work' meant to help debug further.

If you're writing this script for Python exercise, I would recommend to review:

  • PyMongo Tutorial
  • MongoDB Tutorial: query documents

You could also enrol in a free online course at MongoDB University for M220P: MongoDB for Python Developers.

However, if you are just trying to accomplish your task of exporting CSV from a collection. As an alternative you could just use MongoDB's mongoexport. Which has the support for :

  • Exporting specific fields via --fields "name,address"
  • Exporting in CSV via --type "csv"
  • Exporting specific values with query via --query "..."

See mongoexport usage for more information.

like image 143
Wan Bachtiar Avatar answered Oct 21 '22 23:10

Wan Bachtiar


I had no luck with .find().forEach() either, but this should find what you are searching for and then print it.

First find all documents that match what you are searching for

cursors = db.myCollection.find({"name": {$regex: REGEX}})

then iterate it over the matches

for cursor in cursors
    print(cursor.get("name"))
like image 44
GodIsAnAstronaut Avatar answered Oct 22 '22 01:10

GodIsAnAstronaut