With the NoSQL movement growing based on document-based databases, I've looked at MongoDB lately. I have noticed a striking similarity with how to treat items as "Documents", just like Lucene does (and users of Solr).
So, the question: Why would you want to use NoSQL (MongoDB, Cassandra, CouchDB, etc) over Lucene (or Solr) as your "database"?
What I am (and I am sure others are) looking for in an answer is some deep-dive comparisons of them. Let's skip over relational database discussions all together, as they serve a different purpose.
Lucene gives some serious advantages, such as powerful searching and weight systems. Not to mention facets in Solr (which Solr is being integrated into Lucene soon, yay!). You can use Lucene documents to store IDs, and access the documents as such just like MongoDB. Mix it with Solr, and you now get a WebService-based, load balanced solution.
You can even throw in a comparison of out-of-proc cache providers such as Velocity or MemCached when talking about similar data storing and scalability of MongoDB.
The restrictions around MongoDB reminds me of using MemCached, but I can use Microsoft's Velocity and have more grouping and list collection power over MongoDB (I think). Can't get any faster or scalable than caching data in memory. Even Lucene has a memory provider.
MongoDB (and others) do have some advantages, such as the ease of use of their API. New up a document, create an id, and store it. Done. Nice and easy.
Solr is built on top of lucene to provide a search platform. SOLR is a wrapper over Lucene index. It is simple to understand: SOLR is car and Lucene is its engine. You just need to know how to drive car (SOLR) and also need to know few things of engine (Lucene) in case if there will be any issue in your car engine.
Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.
MongoDB supports advanced features for searching any field or range of queries or regular expression while NoSQL databases are more flexible in data storage and processing. MongoDB uses the features of sharding to scale horizontally.
The term "NoSQL" is a marketing brand that helps them to explain their value. You're right that Lucene/Solr is technically very similar to a NoSQL document store: it's a denormalized bag of documents (their term) with fields that aren't necessarily consistent across the collection of documents.
This is a great question, something I have pondered over quite a bit. I will summarize my lessons learned:
You can easily use Lucene/Solr in lieu of MongoDB for pretty much all situations, but not vice versa. Grant Ingersoll's post sums it up here.
MongoDB etc. seem to serve a purpose where there is no requirement of searching and/or faceting. It appears to be a simpler and arguably easier transition for programmers detoxing from the RDBMS world. Unless one's used to it Lucene & Solr have a steeper learning curve.
There aren't many examples of using Lucene/Solr as a datastore, but Guardian has made some headway and summarize this in an excellent slide-deck, but they too are non-committal on totally jumping on Solr bandwagon and "investigating" combining Solr with CouchDB.
Finally, I will offer our experience, unfortunately cannot reveal much about the business-case. We work on the scale of several TB of data, a near real-time application. After investigating various combinations, decided to stick with Solr. No regrets thus far (6-months & counting) and see no reason to switch to some other.
Summary: if you do not have a search requirement, Mongo offers a simple & powerful approach. However if search is key to your offering, you are likely better off sticking to one tech (Solr/Lucene) and optimizing the heck out of it - fewer moving parts.
My 2 cents, hope that helped.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With