I need to check if a find
statement returns a non-empty query.
What I was doing was the following:
query = collection.find({"string": field}) if not query: #do something
Then I realized that my if
statement was never executed because find
returns a cursor, either the query is empty or not.
Therefore I checked the documentation and I find two methods that can help me:
count(with_limit_and_skip=False)
which (from the description):
Returns the number of documents in the results set for this query.
It seems a good way to check, but this means that I need to count all the results in cursor to know if it is zero or not, right? A little bit expensive?
retrieved
which (from the description):
The number of documents retrieved so far.
I tested it on an empty query set and it returns zero, but it's not clear what it does and I don't know if it's right for me.
So, which is the best way (best practice) to check if a find()
query returns an empty set or not? Is one of the methods described above right for this purpose? And what about performance? Are there other ways to do it?
Just to be clear: I need to know if the query is empty and I'd like to find the best way with the cursor with respect to performance and being pythonic.
Check if the Cursor object is empty or not? Approach 1: The cursor returned is an iterable, thus we can convert it into a list. If the length of the list is zero (i.e. List is empty), this implies the cursor is empty as well.
Manually iterating a cursor. In MongoDB, the find() method return the cursor, now to access the document we need to iterate the cursor. In the mongo shell, if the cursor is not assigned to a var keyword then the mongo shell automatically iterates the cursor up to 20 documents.
As we already discussed what is a cursor. It is basically a tool for iterating over MongoDB query result sets. This cursor instance is returned by the find() method.
The Cursor is a MongoDB Collection of the document which is returned upon the find method execution. By default, it is automatically executed as a loop. However, we can explicitly get specific index document from being returned cursor. It is just like a pointer which is pointing upon a specific index value.
EDIT: While this was true in 2014, modern versions of pymongo and MongoDB have changed this behaviour. Buyer beware:
.count()
is the correct way to find the number of results that are returned in the query. The count()
method does not exhaust the iterator for your cursor, so you can safely do a .count()
check before iterating over the items in the result set.
Performance of the count method was greatly improved in MongoDB 2.4. The only thing that could slow down your count
is if the query has an index set on it, or not. To find out if you have an index on the query, you can do something like
query = collection.find({"string": field}) print query.explain()
If you see BasicCursor
in the result, you need an index on your string
field for this query.
EDIT: as @alvapan pointed out, pymongo deprecated this method in pymongo 3.7+ and now prefers you to use count_documents
in a separate query.
item_count = collection.count_documents({"string": field})
The right way to count the number of items you've returned on a query is to check the .retreived
counter on the query after you iterate over it, or to enumerate
the query in the first place:
# Using .retrieved query = collection.find({"string": field}) for item in query: print(item) print('Located {0:,} item(s)'.format(query.retrieved))
Or, another way:
# Using the built-in enumerate query = collection.find({"string": field}) for index, item in enumerate(query): print(item) print('Located {0:,} item(s)'.format(index+1))
How about just using find_one
instead of find
? Then you can just check whether you got a result or None
. And if "string" is indexed, you can pass fields = {"string":1, "_id" :0}
, and thus make it an index-only query, which is even faster.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With