For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field : <pre class="prettyprint"><code>db.ensureIndex( { field: 1 } ) </code></pre> and creating a text index on that field: <pre class="prettyprint"><code>db.ensureIndex( { field: "text" } </code></pre> Where, in both cases, <code>field</code> is of <code>string</code> type. I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a <code>$regex</code> search.

The two index options are very different. <ul> <li>When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.</li> <li> A text index on the other hand will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc). <blockquote> Text Search Text search supports the search of string content in documents of a collection. MongoDB provides the <code>$text</code> operator to perform text search in queries and in aggregation pipelines. The text search process: <pre class="prettyprint"><code>tokenizes and stems the search term(s) during both the index creation and the text command execution. assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query. </code></pre> The <code>$text</code> operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match. </blockquote> </li> <li> <code>$regex</code> searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can: <blockquote> If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range. </blockquote> </li> </ul> http://docs.mongodb.org/manual/core/index-text/ http://docs.mongodb.org/manual/reference/operator/query/regex/

MongoDB - Difference between index on text field and text index?

Tags:

text

indexing

mongodb

For a MongoDB field that contains strings (for example, state or province names), what (if any) difference is there between creating an index on a string-type field :

db.ensureIndex( { field: 1 } )

and creating a text index on that field:

db.ensureIndex( { field: "text" }

Where, in both cases, field is of string type.

I'm looking for a way to do a case-insensitive search on a text field which would contain a single word (maybe more). Being new to Mongo, I'm having trouble distinguishing between using the above two index methods, and even something like a $regex search.

224

asked Jun 19 '14 20:06

russdot

2 Answers

The two index options are very different.

When you create a regular index on a string field it indexes the entire value in the string. Mostly useful for single word strings (like a username for logins) where you can match exactly.
A text index on the other hand will tokenize and stem the content of the field. So it will break the string into individual words or tokens, and will further reduce them to their stems so that variants of the same word will match ("talk" matching "talks", "talked" and "talking" for example, as "talk" is a stem of all three). Mostly useful for true text (sentences, paragraphs, etc).
Text Search

Text search supports the search of string content in documents of a collection. MongoDB provides the $text operator to perform text search in queries and in aggregation pipelines.

The text search process:
```
tokenizes and stems the search term(s) during both the index creation and the text command execution.
assigns a score to each document that contains the search term in the indexed fields. The score determines the relevance of a document to a given search query.
```
The $text operator can search for words and phrases. The query matches on the complete stemmed words. For example, if a document field contains the word blueberry, a search on the term blue will not match the document. However, a search on either blueberry or blueberries will match.
$regex searches can be used with regular indexes on string fields, to provide some pattern matching and wildcard search. Not a terribly effective user of indexes but it will use indexes where it can:

If an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.

http://docs.mongodb.org/manual/core/index-text/

http://docs.mongodb.org/manual/reference/operator/query/regex/

105

answered Oct 12 '22 00:10

John Petrone

text indexes allow you to search for words inside texts. You can do the same using a regex on a non text-indexed text field, but it would be much slower.

Prior to MongoDB 2.6, text search operations had to be made with their own command, which was a big drawback because you coulnd't combine it with other filters, nor treat the result as a common cursor. As of now, the text search is just another another operator for the typical find method and that's super nice.

So, Why is a text index, and its subsequent searchs faster than a regex on a non-indexed text field? It's because text indexes work as a dictionary, a clever one that's capable of discarding words on a per-language basis (defaults to english). When you run a text search query, you run it against the dictionary, saving yourself the time that would otherwise be spent iterating over the whole collection.

Keep in mind that the text index will grow along with your collection, and it can use a lot of space. I learnt this the hard way when using capped collections. There's no way to cap text indexes.

A regular index on a text field, such as

db.ensureIndex( { field: 1 } )

will be useful only if you search for the whole text. It's used for example to look for alphanumeric hashes. It doesn't make any sense to apply this kind of indexes when storing text paragraphs, phrases, etc.

answered Oct 11 '22 23:10

ffflabs

Related questions
                            
                                Mongoose always returning an empty array NodeJS
                            
                                Cannot create index in mongodb, "key too large to index"
                            
                                How to stream MongoDB Query Results with nodejs?
                            
                                How to store geospatial information in mongoDB
                            
                                MongoDB Failing to Start - ***aborting after fassert() failure
                            
                                Aggregation filter after $lookup
                            
                                Mongoose: ObjectId Comparisons fail inconsistently
                            
                                Mongoose instance .save() not working
                            
                                How to fix 'Error: querySrv EREFUSED' when connecting to MongoDB Atlas?
                            
                                Mongo DB - difference between standalone & 1-node replica set
                            
                                What is the recommended equivalent of cascaded delete in MongoDB for N:M relationships?
                            
                                mongodb impossible (?) E11000 duplicate key error dup key when upserting
                            
                                Mongoose Not Creating Indexes
                            
                                How can I create an index with pymongo [duplicate]
                            
                                Does MongoDB support floating point types?
                            
                                Convert .NET Guid to MongoDB ObjectID
                            
                                Is it bad to change _id type in MongoDB to integer?
                            
                                How to use $unset and $set in combination in mongoDB
                            
                                Can't start mongodb service
                            
                                Mongoid finders not working?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With