Given: Connection is Safe=True so Update's return will contain update information.
Say I have a documents that look like:
[{'a': [1]}, {'a': [2]}, {'a': [1,2]}]
And I issue:
coll.update({}, {'$addToSet': {'a':1}}, multi=True)
The result would be:
{u'connectionId': 28,
u'err': None,
u'n': 3,
u'ok': 1.0,
u'updatedExisting': True
}
Even when come documents already have that value. To avoid this I could issue a command.
coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
What's the Time Complexity Comparison for $addToSet vs. $push with a $ne check ?
2 Answers. The difference between $push/$addtoset is the $addToSet do not add the item to the given field if it already contains it, on the other hand, $push will add the given object to field whether it exists or not. db.
Definition. The $addToSet operator adds a value to an array unless the value is already present, in which case $addToSet does nothing to that array. The $addToSet operator has the form: { $addToSet: { <field1>: <value1>, ... } }
If the value is an array, $push appends the whole array as a single element. To add each element of the value separately, use the $each modifier with $push . For an example, see Append a Value to Arrays in Multiple Documents. For a list of modifiers available for $push , see Modifiers.
In MongoDB, the $push operator is used to appends a specified value to an array. If the mentioned field is absent in the document to update, the $push operator add it as a new field and includes mentioned value as its element. If the updating field is not an array type field the operation failed.
Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)
https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp
if speed is really important then why not use a hash:
instead of:
{'$addToSet': {'a':1}}
{'$addToSet': {'a':10}}
use:
{$set: {'a.1': 1}
{$set: {'a.10': 1}
Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.
The first query being:
coll.update({}, {'$addToSet': {'a':1}}, multi=True)
And the second being:
coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)
First problem springs to mind here, no indexes. $addToSet
, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.
In reality you are looking for all documents that do not have 1
in a
already and looking to $push
the value 1
to that a
array.
So 2 points to the second query even before we get into time complexity here because the first query:
$addToSet
So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.
There is a problem to using big O notation to explain the time complexity of each query here:
a
however not using indexes does not.However the first query would look something like: O(n) per document since:
Per collection, without the index it would be: O(2n2) since the complexity of iterating a
will expodentially increase with every new document.
The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne
would have the same problems as $addToSet
without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a
in then all documents without 1
in their set based upon the b-tree.
So based upon time complexity and the notes at the beginning I would say query 2 is better.
If I am honest I am not used to explaining in "Big O" Notation so this is experimental.
Hope it helps,
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With