I'm a complete beginner with Solr, so bear with me. :)
In my current project I have a very simple DB - just 1 table that contains 4 fields: id, name, subject, msg.
The way I understand, every time a new record is added (or removed), I'd need to add that record to the index, essentially performing two operations: inserting the record into the DB and adding it to the index.
Is this standard procedure, or is there a way to direct Solr to automatically reindex the DB table either at some interval or whenever there are updates?
Also, since the table is so simple, does it even make sense to store this info in the DB? Why not just keep it in the Solr index, considering that I want the records to be searchable by name, subject, and msg?
My setup is Java, Hibernate, MySQL, and Solrj.
Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support. It is a document database that offers SQL support and executes it in a distributed manner.
Solr (and underlying Lucene) index is a specially designed data structure, stored on the file system as a set of index files. The index is designed with efficient data structures to maximize performance and minimize resource usage.
By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.
Using a database or not really boils down to how long term you want to keep and grow this data. It is much, much easier to corrupt a whole Solr index (and lose all of your data) than it is to corrupt a whole database. Also, Solr does not have great support for modifying a schema without starting with a fresh index. For instance, you could add another field just fine, but you could not change the name or type of a field without wiping out your index.
If you do go with a DB, you can setup Solr to index directly from the DB using DataImportHandler. For your schema, this should be pretty straightforward, but this can get painful quickly as your DB gets more complex. I think there is some advantage to using the Hibernate objects you already have setup and just inserting them using Solrj. The other pain point with DataImportHandler is that it is completely controlled using http. So you need to manage separate cron jobs (or some other code) to handle the scheduling using wget
or curl
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With