We need to design a system which allows users to search by different keywords in large texts and also, in the future, create some basic reports regarding the frequency of that keyword in all the articles over a period.
We will have:
To do that, we came up with the following solution:
The system will search SOLR by keywords and then will look up the results in MySQL to retrieve additional information.
So, would this be a good approach?
If most searches will be only on the articles added in the last month, would it be a good idea to keep two databases, one with the articles added in the last month for most searches and another with all the articles?
If you have any tips/tricks on how to improve this, it would be greatly appreciated.
Thanks in advance!
Full-text search refers to searching some text inside extensive text data stored electronically and returning results that contain some or all of the words from the query. In contrast, traditional search would return exact matches.
Indexing is a data structure technique that allows you to quickly retrieve records from a database file.
The information requirements are the most important part.
I think your solution is quite good. I would evaluate putting a memcache instance before SOLR if you want to get faster responses on common queries.
I am not sure about the two databases, you would have to see what's the performance benefit compared to the burden of moving records from the first to the second DB as they age. I doubt there is a huge benefit, but that is just gut feeling, don't take my words and run experiments.
Also, are you considering the fact that you may need some horizontal-scalable solution if your dataset becomes very large?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With