Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr and MySQL, How to keep an updated index, and, is a DB even needed if it's simple?

I'm a complete beginner with Solr, so bear with me. :)

In my current project I have a very simple DB - just 1 table that contains 4 fields: id, name, subject, msg.

The way I understand, every time a new record is added (or removed), I'd need to add that record to the index, essentially performing two operations: inserting the record into the DB and adding it to the index.

Is this standard procedure, or is there a way to direct Solr to automatically reindex the DB table either at some interval or whenever there are updates?

Also, since the table is so simple, does it even make sense to store this info in the DB? Why not just keep it in the Solr index, considering that I want the records to be searchable by name, subject, and msg?

My setup is Java, Hibernate, MySQL, and Solrj.

like image 221
Val Schuman Avatar asked Apr 13 '11 14:04

Val Schuman


People also ask

Does Solr need a database?

Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.

Where Solr indexes are stored?

Solr (and underlying Lucene) index is a specially designed data structure, stored on the file system as a set of index files. The index is designed with efficient data structures to maximize performance and minimize resource usage.

How indexing happens in Solr?

By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.


1 Answers

Using a database or not really boils down to how long term you want to keep and grow this data. It is much, much easier to corrupt a whole Solr index (and lose all of your data) than it is to corrupt a whole database. Also, Solr does not have great support for modifying a schema without starting with a fresh index. For instance, you could add another field just fine, but you could not change the name or type of a field without wiping out your index.

If you do go with a DB, you can setup Solr to index directly from the DB using DataImportHandler. For your schema, this should be pretty straightforward, but this can get painful quickly as your DB gets more complex. I think there is some advantage to using the Hibernate objects you already have setup and just inserting them using Solrj. The other pain point with DataImportHandler is that it is completely controlled using http. So you need to manage separate cron jobs (or some other code) to handle the scheduling using wget or curl.

like image 69
Bart Avatar answered Oct 13 '22 05:10

Bart