Can one make Apache Solr index transactionally consistent with DB being indexed?

Tags:

I am new to Solr. I am trying to make a server that stores structured data in a database, and which can be searched using Solr/Lucene. The server can be is clustered into any number of identical nodes for high availability.

It seems that the standard configuration Solr stores the index in a file on the file system. This seems to introduce some problems with consistency and clustering.

How do I make the index transactionally consistent with the DB? Is there a way to do this? (e.g. some way to make commits to the DB coordinated with commits to the Solr index?)

Is there any way to store the index in the (relational) DB? This would solve the consistency problems and cluster problems, but I don't find a lot of literature on how to do this.

When configured as a cluster, does each cluster node need to maintain it's own copy of the index. It is not clear whether multiple instances of Solr can update a single index or not.

Or -- do we give up accept that the index is not guaranteed to be consistent, rebuild it every day or so? What do people normally do about this?

784

asked Oct 19 '12 02:10

AgilePro

2 Answers

Q> How do I make the index transactionally consistent with the DB?
A> You can't. You can probably invent another transaction layer on top, but it will take ages to develop and you won't reach 100% consistency anyway. You could, for example, send data both to the DB and Solr and only commit after both data arrives but this will not be atomic.

Q> Is there any way to store the index in the (relational) DB?
A> With Lucene 4.0, you probably can (by writing your own codec). But this won't solve your problem.

Q> When configured as a cluster, does each cluster node need to maintain it's own copy of the index?
A> Yes.

Q> It is not clear whether multiple instances of Solr can update a single index or not.
A> Multiple Lucene/Solr instances can't write to the same index file(s). Max you can do is to create multiple IndexSearchers. But this is probably done at Solr level anyway.

Q> do we give up accept that the index is not guaranteed to be consistent?
A> Yes. I think you are too db-centric. Think about Solr/Lucene as you think about Google - I bet they don't roll out their entire index atomically throughout the world. If search results will have minor inconsistencies depending which server you hit (for a few seconds of course), it's not a big deal.

Q> rebuild it every day or so? What do people normally do about this?
A> Lucene has near-real time search but at the basic level you just send index updates and commit as db changes happen, then reopen the index reader to see these updates. This is all done automagically in Solr.

answered Oct 11 '22 06:10

mindas

In know this is a bit old but it might help someone. You can try solrcloud with Apache zookeeper.

Apache Solr out of the box includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability- Called SolrCloud, these capabilities provide distributed indexing and search capabilities, supporting the following features with little config:

Central configuration for the entire cluster
Automatic load balancing and fail-over for queries
ZooKeeper integration for cluster coordination and configuration.

Zookeeper is a cluster manager for solr. It works really well with solr.

https://cwiki.apache.org/confluence/display/solr/SolrCloud

http://zookeeper.apache.org/doc/trunk/zookeeperOver.html

answered Oct 11 '22 06:10

Victor Odiah

Related questions
                            
                                Apache Nutch and Solr integration
                            
                                SOLR performance tuning
                            
                                How Solr indexes & search works so fast?
                            
                                Synonyms with concepts that have spaces, or are multiple words
                            
                                Unable to create solr core
                            
                                How to use n-grams approximate matching with Solr?
                            
                                Weird Solr/Lucene behaviors with boolean operators
                            
                                Equivalent of copyField of Solr on ElasticSearch?
                            
                                How to configure Solr to do partial word matching
                            
                                Solr - _version_ field must exist in schema and be searchable
                            
                                Solr Composite Unique key from existing fields in schema
                            
                                Upgrade Apache Solr 8.10.1 log4j 2.14.1 version to 2.15 to address critical vulnerability
                            
                                Solr filter query including NOT and OR
                            
                                upgrade to solr 6.1, exception ClassNotFoundException: solr.admin.AdminHandlers
                            
                                Is there a way to include stopwords when searching exact phrases in Solr?
                            
                                Using multiple tokenizers in Solr
                            
                                What SQL datatype should be used to populate a Solr location (spatial) field when using a DataImportHandler?
                            
                                solr Data Import Handlers for MongoDB
                            
                                How to configure Magento Enterpise to use Solr as the main search engine?
                            
                                What regular expression features are supported by Solr edismax?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Can one make Apache Solr index transactionally consistent with DB being indexed?

Tags:

solr

lucene

transactions

consistency

cluster-computing

AgilePro

People also ask

2 Answers

mindas

Victor Odiah

Recent Activity

Donate For Us