Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are document stores like Lucene / Solr not included in NoSQL conversations?

Tags:

nosql

solr

lucene

All of us have come across the recent hype of no-SQL solutions lately. MongoDB, CouchDB, BigTable, Cassandra, and others have been listed as no-SQL options. Here's an example:

http://architects.dzone.com/articles/what-nosql-store-should-i-use

However, three years ago a co-worker and I were using Lucene.NET as what seem to fit the description of no-SQL. We did not use it just for user-inputted search queries; we used it to make a few reindexed RDBMS table data extremely performant. We implemented our own .NET sort-of-equivalent-to-Solr service to manage these indexes and make them callable. When I left the company, the team switched to Solr itself. (For those not in the know, Solr is a web service that wraps Lucene with REST-callable queries and index dumps.)

What I don't understand is, why is Solr not counted in the typical lists of no-SQL solution options? Am I missing something here? I assume that there are technical reasons why Solr is not comparable to the likes of CouchDB, etc., and in fact I understand that CouchDB uses Lucene as its data store (yes?), but what disqualifies Solr?

I'm not asking as some kind of Solr fanboy or anything, I just don't understand why Solr and the like don't fit the definition of no-SQL, and if Solr technically does fit the definition then what about it likely makes people pooh-pooh it? I'm asking because I'm having difficulty determining whether I should continue using Lucene-based solutions (like Solr) for solutions that I build or if I should really do more research with these other options.

like image 776
Jon Davis Avatar asked Jul 26 '10 23:07

Jon Davis


People also ask

Is Lucene a NoSQL database?

Apache Solr is a subproject of Apache Lucene, which is the indexing technology behind most recently created search and index technology. Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support.

Why NoSQL is not good for transactions?

Most databases in NoSQL do not perform ACID transactions. Modern applications requiring these properties in their final transactions cannot find a good use of NoSQL. It does not use structured query language and are not preferred for structured data.

What is the difference between Solr and Lucene?

Lucene is a full-text search engine library, whereas Solr is a full-text search engine web application built on Lucene. One way to think about Lucene and Solr is as a car and its engine. The engine is Lucene; the car is Solr. A wide array of companies (Ford, Salesforce, etc.)

Is MongoDB based on Lucene?

Amazon and MongoDB both use Lucene every day, and the most important use case is no doubt application search, in which the engine is primarily used by humans.


2 Answers

I once listened to an interview with author Ursula K. LeGuin about fiction writing. The interviewer asked her about authors who work in different genre of writing. What makes one author a romance writer, and another a mystery writer, and another a science fiction writer? LeGuin responded by explaining:

Genre is about marketing, not about content.

It was an eye-opening statement.

I think the same applies to technology solutions. The NoSQL movement is attracting attention because it's full of marketing energy right now. NoSQL data stores like Hadoop, CouchDB, MongoDB, have commercial ventures backing them, pushing their solutions as new and innovative and exciting so they can grow their business. The term "NoSQL" is a marketing brand that helps them to explain their value.

You're right that Lucene/Solr is technically very similar to a NoSQL document store: it's a denormalized bag of documents (their term) with fields that aren't necessarily consistent across the collection of documents. It's indexed in a sophisticated way to allow you to search across all fields or by specific fields.

But that's not the genre Lucene uses to explain its value. They don't have the same mission to grow a market and a business, since they're managed by the Apache Foundation. They're happy to focus on the use case of fulltext search, even though the technology could be used in other ways. They're following a tenet of software success: do one thing, and do it well.

like image 156
Bill Karwin Avatar answered Nov 10 '22 23:11

Bill Karwin


After doing more Google-searching, I think this document sums it up pretty well:

https://web.archive.org/web/20100504055638/http://www.lucidimagination.com/blog/2010/04/30/nosql-lucene-and-solr/

Case in point, Lucene/Solr is NoSql and could be considered one of NoSql's more mature "forefathers". It just does not get the NoSql hype it deserves because it didn't invent the term "no-SQL" and its users don't use the term, so the hype machine overlooked it.

like image 42
Jon Davis Avatar answered Nov 10 '22 22:11

Jon Davis