Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pymongo / MongoDB: create index or ensure index?

I don't understand the difference between create_index and ensure_index in pymongo. On the MongoDB indexes page, it says

you can create an index by calling the ensureIndex()

However in pymongo there are two different commands create_index and ensure_index, and the documentation for create index has:

Unlike create_index(), which attempts to create an index unconditionally, ensure_index() takes advantage of some caching within the driver such that it only attempts to create indexes that might not already exist. When an index is created (or ensured) by PyMongo it is “remembered” for ttl seconds. Repeated calls to ensure_index() within that time limit will be lightweight - they will not attempt to actually create the index.

Am I right in understanding that ensure_index will create a permanent index, or do I need to use create_index for this?

like image 856
YXD Avatar asked May 06 '11 14:05

YXD


People also ask

How do you create an index in PyMongo?

Create an index for a MongoDB collection, and specify the order. When you use PyMongo to create an index, you can pair the index name with either a 1 or a -1 integer value to explicitly set the sort order to ascending or descending, respectively.

Does MongoDB automatically create indexes?

MongoDB automatically determines whether to create a multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.

What is ensure index in MongoDB?

They store the value of a specific field or more than one fields (i.e. set of fields), which are ordered by the value of the field as indicated in the index. The ensureIndex () Method. In MongoDB, we use 'ensureIndex ()' method to create an index.

Why do we create index in MongoDB?

Indexes support the efficient resolution of queries. Without indexes, MongoDB must scan every document of a collection to select those documents that match the query statement. This scan is highly inefficient and require MongoDB to process a large volume of data.


2 Answers

@andreas-jung is right in that ensure_index() is a wrapper over create_index(), I think the confusion arises with the phrase:

When an index is created (or ensured) by PyMongo it is “remembered” for ttl seconds.

It's not that the index is temporary or "transient", what happens is that during the specified amount of seconds, a call to ensure_index() trying to create the same index again will not have any effect and will not call create_index() underneath, but after that "cache" expires, a call to ensure_index() will again call create_index() underneath.

I perfectly understand your confusion because quite frankly PyMongo's docs don't make a very good job at explaining how this works, but if you head over to the Ruby docs, the explanation is a little clearer:

  • (String) ensure_index(spec, opts = {})

Calls create_index and sets a flag to not do so again for another X minutes. this time can be specified as an option when initializing a Mongo::DB object as options[:cache_time] Any changes to an index will be propogated through regardless of cache time (e.g., a change of index direction)

The parameters and options for this methods are the same as those for Collection#create_index.

Examples:

Call sequence:

Time t: @posts.ensure_index([['subject', Mongo::ASCENDING]) -- calls create_index and sets the 5 minute cache

Time t+2min : @posts.ensure_index([['subject', Mongo::ASCENDING]) -- doesn't do anything

Time t+3min : @posts.ensure_index([['something_else', Mongo::ASCENDING]) -- calls create_index and sets 5 minute cache

Time t+10min : @posts.ensure_index([['subject', Mongo::ASCENDING]) -- calls create_index and resets the 5 minute counter

I'm not claiming drivers work exactly the same, it's just that for illustration purposes their explanation is a little better IMHO.

like image 55
Juan Gomez Avatar answered Sep 21 '22 05:09

Juan Gomez


Keep in mind that in Mongo 3.x ensureIndex is deprecated and should be discouraged.

Deprecated since version 3.0.0: db.collection.ensureIndex() is now an alias for db.collection.createIndex().

The same is in pymongo:

DEPRECATED - Ensures that an index exists on this collection.

Which means that you should always use create_index.

like image 24
Salvador Dali Avatar answered Sep 21 '22 05:09

Salvador Dali