Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

design very large database for searching text

We need to design a system which allows users to search by different keywords in large texts and also, in the future, create some basic reports regarding the frequency of that keyword in all the articles over a period.

We will have:

  • about 200,000 articles added every day
  • each article text is about 2KB
  • articles are stored for 6 months

To do that, we came up with the following solution:

  • create a SOLR repository to store the articles
  • use a MySQL database to store the article additional information

The system will search SOLR by keywords and then will look up the results in MySQL to retrieve additional information.

So, would this be a good approach?

If most searches will be only on the articles added in the last month, would it be a good idea to keep two databases, one with the articles added in the last month for most searches and another with all the articles?

If you have any tips/tricks on how to improve this, it would be greatly appreciated.

Thanks in advance!

like image 506
Stelian Matei Avatar asked Feb 13 '12 11:02

Stelian Matei


People also ask

What is full text search databases?

Full-text search refers to searching some text inside extensive text data stored electronically and returning results that contain some or all of the words from the query. In contrast, traditional search would return exact matches.

Which of the following allows you to quickly search record in a large database?

Indexing is a data structure technique that allows you to quickly retrieve records from a database file.

What is the most important thing in a good database design?

The information requirements are the most important part.


1 Answers

I think your solution is quite good. I would evaluate putting a memcache instance before SOLR if you want to get faster responses on common queries.

I am not sure about the two databases, you would have to see what's the performance benefit compared to the burden of moving records from the first to the second DB as they age. I doubt there is a huge benefit, but that is just gut feeling, don't take my words and run experiments.

Also, are you considering the fact that you may need some horizontal-scalable solution if your dataset becomes very large?

like image 53
Savino Sguera Avatar answered Sep 21 '22 21:09

Savino Sguera