I need to check if a <code>find</code> statement returns a non-empty query. What I was doing was the following: <pre class="prettyprint"><code>query = collection.find({"string": field}) if not query: #do something </code></pre> Then I realized that my <code>if</code> statement was never executed because <code>find</code> returns a cursor, either the query is empty or not. Therefore I checked the documentation and I find two methods that can help me: <ol> <li> <code>count(with_limit_and_skip=False)</code> which (from the description): <blockquote> Returns the number of documents in the results set for this query. </blockquote> It seems a good way to check, but this means that I need to count all the results in cursor to know if it is zero or not, right? A little bit expensive? </li> <li> <code>retrieved</code> which (from the description): <blockquote> The number of documents retrieved so far. </blockquote> I tested it on an empty query set and it returns zero, but it's not clear what it does and I don't know if it's right for me. </li> </ol> So, which is the best way (best practice) to check if a <code>find()</code> query returns an empty set or not? Is one of the methods described above right for this purpose? And what about performance? Are there other ways to do it? <hr> Just to be clear: I need to know if the query is empty and I'd like to find the best way with the cursor with respect to performance and being pythonic.

EDIT: While this was true in 2014, modern versions of pymongo and MongoDB have changed this behaviour. Buyer beware: <code>.count()</code> is the correct way to find the number of results that are returned in the query. The <code>count()</code> method does not exhaust the iterator for your cursor, so you can safely do a <code>.count()</code> check before iterating over the items in the result set. Performance of the count method was greatly improved in MongoDB 2.4. The only thing that could slow down your <code>count</code> is if the query has an index set on it, or not. To find out if you have an index on the query, you can do something like <pre class="prettyprint"><code>query = collection.find({"string": field}) print query.explain() </code></pre> If you see <code>BasicCursor</code> in the result, you need an index on your <code>string</code> field for this query. <hr> EDIT: as @alvapan pointed out, pymongo deprecated this method in pymongo 3.7+ and now prefers you to use <code>count_documents</code> in a separate query. <pre class="prettyprint"><code>item_count = collection.count_documents({"string": field}) </code></pre> The right way to count the number of items you've returned on a query is to check the <code>.retreived</code> counter on the query after you iterate over it, or to <code>enumerate</code>the query in the first place: <pre class="prettyprint"><code># Using .retrieved query = collection.find({"string": field}) for item in query: print(item) print('Located {0:,} item(s)'.format(query.retrieved)) </code></pre> Or, another way: <pre class="prettyprint"><code># Using the built-in enumerate query = collection.find({"string": field}) for index, item in enumerate(query): print(item) print('Located {0:,} item(s)'.format(index+1)) </code></pre>

How about just using <code>find_one</code> instead of <code>find</code> ? Then you can just check whether you got a result or <code>None</code>. And if "string" is indexed, you can pass <code>fields = {"string":1, "_id" :0}</code>, and thus make it an index-only query, which is even faster.

How to check if a pymongo cursor has query results

Tags:

python

mongodb

mongodb-query

pymongo

I need to check if a find statement returns a non-empty query.

What I was doing was the following:

query = collection.find({"string": field}) if not query: #do something

Then I realized that my if statement was never executed because find returns a cursor, either the query is empty or not.

Therefore I checked the documentation and I find two methods that can help me:

count(with_limit_and_skip=False) which (from the description):

Returns the number of documents in the results set for this query.

It seems a good way to check, but this means that I need to count all the results in cursor to know if it is zero or not, right? A little bit expensive?
retrieved which (from the description):

The number of documents retrieved so far.

I tested it on an empty query set and it returns zero, but it's not clear what it does and I don't know if it's right for me.

So, which is the best way (best practice) to check if a find() query returns an empty set or not? Is one of the methods described above right for this purpose? And what about performance? Are there other ways to do it?

Just to be clear: I need to know if the query is empty and I'd like to find the best way with the cursor with respect to performance and being pythonic.

352

asked Oct 24 '14 14:10

boh717

2 Answers

EDIT: While this was true in 2014, modern versions of pymongo and MongoDB have changed this behaviour. Buyer beware:

.count() is the correct way to find the number of results that are returned in the query. The count() method does not exhaust the iterator for your cursor, so you can safely do a .count() check before iterating over the items in the result set.

Performance of the count method was greatly improved in MongoDB 2.4. The only thing that could slow down your count is if the query has an index set on it, or not. To find out if you have an index on the query, you can do something like

query = collection.find({"string": field}) print query.explain()

If you see BasicCursor in the result, you need an index on your string field for this query.

EDIT: as @alvapan pointed out, pymongo deprecated this method in pymongo 3.7+ and now prefers you to use count_documents in a separate query.

item_count = collection.count_documents({"string": field})

The right way to count the number of items you've returned on a query is to check the .retreived counter on the query after you iterate over it, or to enumeratethe query in the first place:

# Using .retrieved query = collection.find({"string": field}) for item in query:     print(item)  print('Located {0:,} item(s)'.format(query.retrieved))

Or, another way:

# Using the built-in enumerate query = collection.find({"string": field}) for index, item in enumerate(query):     print(item)  print('Located {0:,} item(s)'.format(index+1))

answered Oct 09 '22 04:10

VooDooNOFX

How about just using find_one instead of find ? Then you can just check whether you got a result or None. And if "string" is indexed, you can pass fields = {"string":1, "_id" :0}, and thus make it an index-only query, which is even faster.

answered Oct 09 '22 06:10

Baruch Oxman

Related questions
                            
                                How to pack and unpack using ctypes (Structure <-> str)
                            
                                How do I get my computer's fully qualified domain name in Python?
                            
                                Python logging - check location of log files?
                            
                                sqlalchemy : executing raw sql with parameter bindings
                            
                                Is there a matplotlib equivalent of MATLAB's datacursormode?
                            
                                Python: simple list merging based on intersections
                            
                                Selenium "Unable to find a matching set of capabilities" despite driver being in /usr/local/bin
                            
                                Sorting a dictionary by value then key
                            
                                Select elements of numpy array via boolean mask array
                            
                                python-asyncio TypeError: object dict can't be used in 'await' expression
                            
                                Profiling python C extensions
                            
                                How to make a short and long version of a required argument using Python Argparse?
                            
                                How can I visualize the weights(variables) in cnn in Tensorflow?
                            
                                transform scipy sparse csr to pandas?
                            
                                Replace textarea with rich text editor in Django Admin?
                            
                                How can I host my own private conda repository?
                            
                                TypeError: Invalid dimensions for image data when plotting array with imshow()
                            
                                How to use asyncio with existing blocking library?
                            
                                Scraping dynamic content using python-Scrapy
                            
                                Iterating over dictionary items(), values(), keys() in Python 3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With