Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full-text search on MongoDB GridFS?

Say, if I want to store PDFs or ePub files using MongoDB's GridFS, is it possible to perform full-text searching on the data files?

like image 248
ikevin8me Avatar asked May 08 '12 01:05

ikevin8me


People also ask

Can MongoDB do full-text search?

MongoDB offers a full-text search solution, MongoDB Atlas Search, for data hosted on MongoDB Atlas.

How do I search text in MongoDB?

Use the $text query operator to perform text searches on a collection with a text index. $text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string.

How does text search work in MongoDB?

MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy.

What is the use of GridFS in MongoDB?

In MongoDB, use GridFS for storing files larger than 16 MB. In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem. If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.


1 Answers

You can't currently do real full text search within mongo: http://www.mongodb.org/display/DOCS/Full+Text+Search+in+Mongo

Feel free to vote for it here: https://jira.mongodb.org/browse/SERVER-380

Mongo is more of a general purpose scalable data store, and as of yet it doesn't have any full text search support. Depending on your use case, you could use the standard b-tree indexes with an array of all of the words in the text, but it won't do stemming or fuzzy matches, etc.

However, I would recommend combining mongodb with a lucene-based application (elastic search is popular). You can store all of your data in mongodb (binary data, metadata, etc.), and then index the plain text of your documents in lucene. Or, if your use case is pure full text search, you might consider just using elastic search instead of mongodb.

Update (April 2013): MongoDB 2.4 now supports a basic full-text index! Some useful resources below.

http://docs.mongodb.org/manual/applications/text-search/

http://docs.mongodb.org/manual/reference/command/text/#dbcmd.text

http://blog.mongohq.com/blog/2013/01/22/first-week-with-mongodb-2-dot-4-development-release/

like image 109
Eve Freeman Avatar answered Oct 16 '22 05:10

Eve Freeman