Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time Complexity of $addToset vs $push when element does not exist in the Array

Tags:

mongodb

Given: Connection is Safe=True so Update's return will contain update information.

Say I have a documents that look like:

[{'a': [1]}, {'a': [2]}, {'a': [1,2]}]

And I issue:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)

The result would be:

{u'connectionId': 28,
 u'err': None,
 u'n': 3,
 u'ok': 1.0,
 u'updatedExisting': True
}

Even when come documents already have that value. To avoid this I could issue a command.

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)

What's the Time Complexity Comparison for $addToSet vs. $push with a $ne check ?

like image 716
meson10 Avatar asked Sep 01 '12 07:09

meson10


People also ask

What is the difference between $Push and $addToSet?

2 Answers. The difference between $push/$addtoset is the $addToSet do not add the item to the given field if it already contains it, on the other hand, $push will add the given object to field whether it exists or not. db.

What does $addToSet do in MongoDB?

Definition. The $addToSet operator adds a value to an array unless the value is already present, in which case $addToSet does nothing to that array. The $addToSet operator has the form: { $addToSet: { <field1>: <value1>, ... } }

How to add element in array in MongoDB?

If the value is an array, $push appends the whole array as a single element. To add each element of the value separately, use the $each modifier with $push . For an example, see Append a Value to Arrays in Multiple Documents. For a list of modifiers available for $push , see Modifiers.

How do I use $Push in MongoDB?

In MongoDB, the $push operator is used to appends a specified value to an array. If the mentioned field is absent in the document to update, the $push operator add it as a new field and includes mentioned value as its element. If the updating field is not an array type field the operation failed.


2 Answers

Looks like $addToSet is doing the same thing as your command: $push with a $ne check. Both would be O(N)

https://github.com/mongodb/mongo/blob/master/src/mongo/db/ops/update_internal.cpp

if speed is really important then why not use a hash:

instead of:

{'$addToSet': {'a':1}}
{'$addToSet': {'a':10}}

use:

{$set: {'a.1': 1}
{$set: {'a.10': 1}
like image 174
andy boot Avatar answered Nov 15 '22 19:11

andy boot


Edit

Ok since I read your question wrong all along it turns out that actually you are looking at two different queries and judging the time complexity between them.

The first query being:

coll.update({}, {'$addToSet': {'a':1}}, multi=True)

And the second being:

coll.update({'a': {'$ne': 1}}, {'$push': {'a':1}}, multi=True)

First problem springs to mind here, no indexes. $addToSet, being an update modifier, I do not believe it uses an index as such you are doing a full table scan to accomplish what you need.

In reality you are looking for all documents that do not have 1 in a already and looking to $push the value 1 to that a array.

So 2 points to the second query even before we get into time complexity here because the first query:

  • Does not use indexes
  • Would be a full table scan
  • Would then do a full array scan (with no index) to $addToSet

So I have pretty much made my mind up here that the second query is what your looking for before any of the Big O notation stuff.

There is a problem to using big O notation to explain the time complexity of each query here:

  • I am unsure of what perspective you want, whether it is per document or for the whole collection.
  • I am unsure of indexes as such. Using indexes will actually create a Log algorithm on a however not using indexes does not.

However the first query would look something like: O(n) per document since:

  • The $addToSet would need to iterate over each element
  • The $addToSet would then need to do an O(1) op to insert the set if it does not exist. I should note I am unsure whether the O(1) is cancelled out or not (light reading suggests my version), I have cancelled it out here.

Per collection, without the index it would be: O(2n2) since the complexity of iterating a will expodentially increase with every new document.

The second query, without indexes, would look something like: O(2n2) (O(n) per document) I believe since $ne would have the same problems as $addToSet without indexes. However with indexes I believe this would actually be O(log n log n) (O(log n) per document) since it would first find all documents with a in then all documents without 1 in their set based upon the b-tree.

So based upon time complexity and the notes at the beginning I would say query 2 is better.

If I am honest I am not used to explaining in "Big O" Notation so this is experimental.

Hope it helps,

like image 42
Sammaye Avatar answered Nov 15 '22 19:11

Sammaye