How does MongoDB's full text search compare to Lucene at the present time? The reason for the question is due to my indeterminacy to:
a) use mongo's FTS implementation in production since it was still in beta around 6 months ago
and
b) because lucene uses Java which will introduce yet another moving part.
MongoDB queries that filter data by searching for exact matches, using greater-than or less-than comparisons, or by using regular expressions will work well enough in many situations. However, these methods fall short when it comes to filtering against fields containing rich textual data.
Amazon and MongoDB both use Lucene every day, and the most important use case is no doubt application search, in which the engine is primarily used by humans.
Why is Lucene faster? Lucene is very fast at searching for data because of its inverted index technique. Normally, datasources structure the data as an object or record, which in turn have fields and values.
It offers high scalability, reliability, and performance. MongoDB also uses text-based indexes for full-text queries, but the search is slow, and the search server does not provide tokenizers and analyzers like Elasticsearch does.
Without wandering into a long topic that would probably not be suited for a programming forum, I'll try and cover this basically, but still try and cover the points.
The main thing to consider when jumping into a broad comparison is this: "How does 'XYZ' relational database engine full text search stack up against Lucene".
So if you consider that, and have had experience with the built in "full text" capabilities of those products then those are the apples you should be comparing with the MongoDB "full text" apples.
In short, MongoDB offers basic full text capabilities, not much different to those found in relational products. As mentioned in a:), the facilities are new, but better than what was there before, which was nothing.
On b:), Lucene, and derivatives/ counterparts (Solr / ElasticSearch, etc) should be considered a different animal altogether. Where you need advanced tokenizing and stemming, built in facilities for "More like this" and facet counts on searches. In those cases the separate product is a required necessity.
Of course there are several solutions around for indexing data from MongoDB stores in Lucene etc, and even customizing this process is not hard. But it is maintaining another moving part in your infrastructure.
So I don't really see this as a need to compare MongoDB text search with Lucene, because ultimately they exist to do different things, it's just a matter of what you need for your application. Choose the solution that is best for you.
The only thing to add is that, the Lucene (and derivative) family are great products. Do not shy away from giving them a go, at least to evaluate. The points from before is there is a lot more power there than any "Standard Database Text Search". Furthermore the admin and learning curve are generally "not as hard as you think". Have a play, it may be worth implementing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With