Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr "real time" indexing

I know there are several questions similar to this but they don't provide a simple answer to the problem at hand. Sorry if you feel this is a duplicate but I think clear and understandable answer would benefit many. So, to the question.

Can Solr indexing updates be automated? And if they can, what would be the optimal way to do it?

Here is a simple use case to clarify the question: I have a database table with several columns of different kind of data. There is a web app which is used to manage the data. I've got separate Solr server to index specified columns in the above mentioned table. How could I achieve an outcome that when users adds, removes or modifies data in the said table, Solr would notice the changed and modify the index.

It would be necessary for it to be "real time". Meaning that after few seconds the changes would take place. Of course with large amount of data it can be more.

Thanks in advance

like image 447
frustrated Avatar asked Aug 10 '11 13:08

frustrated


People also ask

Is Solr real time?

The realtime get feature allows retrieval (by unique-key ) of the latest version of any documents without the associated cost of reopening a searcher. This is primarily useful when using Solr as a NoSQL data store and not just a search index.

How is indexing done in Solr?

By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.

Why Solr is fast?

A major driving factor for Solr performance is RAM. Solr requires sufficient memory for two separate things: One is the Java heap, the other is "free" memory for the OS disk cache. Another potential source of problems is a very high query rate. Adding memory can sometimes let Solr handle a higher rate.

What is the difference between Solr and Elasticsearch?

The main difference between Solr and Elasticsearch is that Solr is a completely open-source search engine. Whereas Elasticsearch though open source is still managed by Elastic's employees. Solr supports text search while Elasticsearch is mainly used for analytical querying, filtering, and grouping.


2 Answers

There are two questions here:

Can Solr indexing updates be automated?

Yes they can, and they should be always automated. You don't want to manually launch the indexing process for every change.

It would be necessary for it to be "real time".

I already mentioned some ways to reduce latency between changed data and updating the index in this answer. You could use autoCommit to make sure that your data is committed within x seconds of the update. Depending on the interval, you'd want to reduce autowarming and adjust other settings, see this for more details.

Also keep an eye on the NRT wiki page for related information and solutions about this.

like image 52
Mauricio Scheffer Avatar answered Sep 18 '22 06:09

Mauricio Scheffer


You may want to take a look at Apache Solr 3.3 with RankingAlgorithm 1.2. It supports NRT (Near Real Time Indexing) and can update 10,000 docs / sec. You can concurrently search during the updates. You do not need to commit or close the searchers. You can get more information about NRT with Solr 3.3 with RankingAlgorithm from here:

http://solr-ra.tgels.org/wiki/en/Near_Real_Time_Search_ver_3.x

like image 39
user925543 Avatar answered Sep 22 '22 06:09

user925543