Given: Connection is Safe=True so Update's return will contain update information. Say I have a documents that look like: <pre class="prettyprint"><code>[{'a': [1]}, {'a': [2]}, {'a': [1,2]}] </code></pre> And I issue: <pre class="prettyprint"><code>coll.update({}, {'$addToSet': {'a':1}}, multi=True) </code></pre> The result would be: <pre class="prettyprint"><code>{u'connectionId': 28, u'err': None, u'n': 3, u'ok': 1.0, u'updatedExisting': True } </code></pre> Even when come documents already have that value. To avoid this I could issue a command. <pre class="prettyprint"><code>coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True) </code></pre> What's the Time Complexity Comparison for $addToSet vs. $push with a $ne check ?

Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N) https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp if speed is really important then why not use a hash: instead of: <pre class="prettyprint"><code>{'$addToSet': {'a':1}} {'$addToSet': {'a':10}} </code></pre> use: <pre class="prettyprint"><code>{$set: {'a.1': 1} {$set: {'a.10': 1} </code></pre>

<h3>Edit</h3> Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them. The first query being: <pre class="prettyprint"><code>coll.update({}, {'$addToSet': {'a':1}}, multi=True) </code></pre> And the second being: <pre class="prettyprint"><code>coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True) </code></pre> First problem springs to mind here, no indexes. <code>$addToSet</code>, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need. In reality you are looking for all documents that do not have <code>1</code> in <code>a</code> already and looking to <code>$push</code> the value <code>1</code> to that <code>a</code> array. So 2 points to the second query even before we get into time complexity here because the first query: <ul> <li>Does not use indexes</li> <li>Would be a full table scan</li> <li>Would then do a full array scan (with no index) to <code>$addToSet</code> </li> </ul> So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff. There is a problem to using big O notation to explain the time complexity of each query here: <ul> <li>I am unsure of what perspective you want, whether it is per document or for the whole collection.</li> <li>I am unsure of indexes as such. Using indexes will actually create a Log algorithm on <code>a</code> however not using indexes does not.</li> </ul> However the first query would look something like: O(n) per document since: <ul> <li>The $addToSet would need to iterate over each element</li> <li>The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.</li> </ul> Per collection, without the index it would be: O(2n2) since the complexity of iterating <code>a</code> will expodentially increase with every new document. The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since <code>$ne</code> would have the same problems as <code>$addToSet</code> without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with <code>a</code> in then all documents without <code>1</code> in their set based upon the b-tree. So based upon time complexity and the notes at the beginning I would say query 2 is better. If I am honest I am not used to explaining in "Big O" Notation so this is experimental. Hope it helps,

Time Complexity of $addToset vs $push when element does not exist in the Array

Tags:

mongodb

Given: Connection is Safe=True so Update's return will contain update information.

Say I have a documents that look like:

[{'a': [1]}, {'a': [2]}, {'a': [1,2]}]

And I issue:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)

The result would be:

{u'connectionId': 28,
 u'err': None,
 u'n': 3,
 u'ok': 1.0,
 u'updatedExisting': True
}

Even when come documents already have that value. To avoid this I could issue a command.

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)

What's the Time Complexity Comparison for $addToSet vs. $push with a $ne check ?

716

asked Sep 01 '12 07:09

meson10

2 Answers

Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)

https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp

if speed is really important then why not use a hash:

instead of:

{'$addToSet': {'a':1}}
{'$addToSet': {'a':10}}

use:

{$set: {'a.1': 1}
{$set: {'a.10': 1}

174

answered Nov 15 '22 19:11

andy boot

Edit

Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

The first query being:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)

And the second being:

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)

First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array.

So 2 points to the second query even before we get into time complexity here because the first query:

Does not use indexes
Would be a full table scan
Would then do a full array scan (with no index) to $addToSet

So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

There is a problem to using big O notation to explain the time complexity of each query here:

I am unsure of what perspective you want, whether it is per document or for the whole collection.
I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.

However the first query would look something like: O(n) per document since:

The $addToSet would need to iterate over each element
The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.

Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

So based upon time complexity and the notes at the beginning I would say query 2 is better.

If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

Hope it helps,

answered Nov 15 '22 19:11

Sammaye

Related questions
                            
                                a field name "ClassName" is being inserted into mongodb by morphia
                            
                                Mongorestore to update records if already exists without --drop
                            
                                Mongoose - delete subdocument array item
                            
                                Update multiple documents by providing documents in body, mongoose/mongodb
                            
                                Spring Data MongoDB Repositories Query multiple fields
                            
                                Cleanup of Mongo journal file
                            
                                Mongo $lookup with more collections and empty field
                            
                                Is there any way to implement pagination in spring webflux and spring data reactive
                            
                                Error: network error while attempting to run command 'isMaster' on host '127.0.0.1:27017'
                            
                                Error Message from MongoDB "Operation `disneys.insertOne()` buffering timed out after 10000ms""
                            
                                PHP Mongo Query NOT NULL
                            
                                How do you create a copy / dup a mongoid object?
                            
                                Pivot rows to columns in MongoDB
                            
                                Mongo/Mongoose Invalid atomic update value error
                            
                                Configuring MongoDB on Windows
                            
                                Mongodb aggregation $group followed by $limit for pagination
                            
                                MongoDB diacriticInSensitive search not showing all accented (words with diacritic mark) rows as expected and vice-versa
                            
                                Group by Date with Local Time Zone in MongoDB
                            
                                Cannot connect to MongoDB in Azure
                            
                                How to lookup only for specific fields in mongo

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With