Using Solr search index as a database - is this "wrong"?

People also ask

Can you use Solr as a database?

Yes, you can use SOLR as a database but there are some really serious caveats : SOLR's most common access pattern, which is over http doesnt respond particularly well to batch querying. Furthermore, SOLR does NOT stream data --- so you can't lazily iterate through millions of records at a time.

Does Solr need a database?

Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.

What is indexing in Solr search?

In general, indexing is an arrangement of documents or (other entities) systematically. Indexing enables users to locate information in a document. Indexing collects, parses, and stores documents. Indexing is done to increase the speed and performance of a search query while finding a required document.

How can I make Solr index faster?

After you post all your documents, call commit once manually or from SolrJ - it will take a while to commit, but this will be much faster overall. Also after you are done with your bulk import, reduce maxTime and maxDocs , so that any incremental posts you will do to Solr will get committed much sooner.

Yes, you can use SOLR as a database but there are some really serious caveats :

SOLR's most common access pattern, which is over http doesnt respond particularly well to batch querying. Furthermore, SOLR does NOT stream data --- so you can't lazily iterate through millions of records at a time. This means you have to be very thoughtful when you design large scale data access patterns with SOLR.
Although SOLR performance scales horizontally (more machines, more cores, etc..) as well as vertically (more RAM, better machines, etc), its querying capabilities are severely limited compared to those of a mature RDBMS. That said, there are some excellent functions, like the field stats queries, which are quite convenient.
Developers who are used to using relational databases will often run into problems when they use the same DAO design patterns in a SOLR paradigm, because of the way SOLR uses filters in queries. There will be a learning curve for developing the right approach to building an application that uses SOLR for part of its large queries or statefull modifications.
The "enterprisy" tools that allow for advanced session management and statefull entities that many advanced web-frameworks (Ruby, Hibernate, ...) offer will have to be thrown completely out the window.
Relational databases are meant to deal with complex data and relationships - and they are thus accompanied by state of the art metrics and automated analysis tools. In SOLR, I've found myself writing such tools and manually stress-testing alot, which can be a time sink.
Joining : this is the big killer. Relational databases support methods for building and optimizing views and queries that join tuples based on simple predicates. In SOLR, there aren't any robust methods for joining data across indices.
Resiliency : For high availability, SolrCloud uses a distributed file system underneath (i.e. HCFS). This model is quite different then that of a relational database, which usually does resiliency using slaves and masters, or RAID, and so on. So you have to be ready to provide the resiliency infrastructure SOLR requires if you want it to be cloud scalable and resistent.

That said - there are plenty of obvious advantages to SOLR for certain tasks : (see http://wiki.apache.org/solr/WhyUseSolr) -- loose queries are much easier to run and return meaningful results. Indexing is done as a matter of default, so most arbitrary queries run pretty effectively (unlike a RDBMS, where you often have to optimize and de-normalize after the fact).

Conclusion: Even though you CAN use SOLR as an RDBMS, you may find (as I have) that there is ultimately "no free lunch" - and the cost savings of super-cool lucene text-searches and high-performance, in-memory indexing, are often paid for by less flexibility and adoption of new data access workflows.

It's perfectly reasonable to use Solr as a database, depending on your application. In fact, that's pretty much what guardian.co.uk is doing.

It's definitely not bad practice per se. It's only bad if you use it the wrong way, just like any other tool at any level, even GOTOs.

When you say "An XML representation..." I assume you're talking about having multiple stored Solr fields and retrieving this using Solr's XML format, and not just one big XML-content field (which would be a terrible use of Solr). The fact that Solr uses XML as default response format is largely irrelevant, you can also use a binary protocol, so it's quite comparable to traditional relational databases in that regard.

Ultimately, it's up to your application's needs. Solr is primarily a text search engine, but can also act as a NoSQL database for many applications.

This was probably done for performance reasons, if it doesn't cause any problems I would leave it alone. There is a big grey area of what should be in a traditional database vs a solr index. Ive seem people do similar things to this (usually key value pairs or json instead of xml) for UI presentation and only get the real object from the database if needed for updates/deletes. But all reads just go to Solr.

I've seen similar things done because it allows for very fast lookup. We're moving data out of our Lucene indexes into a fast key-value store to follow DRY principles and also decrease the size of the index. There's not a hard-and-fast rule for this sort of thing.

Related questions
                            
                                Modify input parameter of a void function and read it afterwards
                            
                                C# equivalent to java's wait and notify?
                            
                                BarCode Image Generator in Java
                            
                                How to use PrintWriter and File classes in Java?
                            
                                what is the purpose of two config files for Hibernate?
                            
                                Transactional saves without calling update method
                            
                                String.format() and hex numbers in Java
                            
                                How to check the type of a value from a JSONObject?
                            
                                when spring boot startup,throw out the "method names must be tokens" exception
                            
                                Is Catching a Null Pointer Exception a Code Smell?
                            
                                Unreachable code compiling without error - How?
                            
                                Is log4j2 compatible with Java 11?
                            
                                How to configure log4j to log different log levels to different files for the same logger
                            
                                PreparedStatements and performance
                            
                                Are there any provable real-world languages? (scala?)
                            
                                Difference between junit-jupiter-api and junit-jupiter-engine
                            
                                How do I obtain the number of days within a given month using Joda-Time?
                            
                                Suppress javac warning "...is internal proprietary API and may be removed in a future release"
                            
                                Exception handling : throw, throws and Throwable
                            
                                Java 8 Lambda Stream forEach with multiple statements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Solr search index as a database - is this "wrong"?

Tags:

java

database

mysql

solr

People also ask

Recent Activity

Donate For Us