design very large database for searching text

Tags:

We need to design a system which allows users to search by different keywords in large texts and also, in the future, create some basic reports regarding the frequency of that keyword in all the articles over a period.

We will have:

about 200,000 articles added every day
each article text is about 2KB
articles are stored for 6 months

To do that, we came up with the following solution:

create a SOLR repository to store the articles
use a MySQL database to store the article additional information

The system will search SOLR by keywords and then will look up the results in MySQL to retrieve additional information.

So, would this be a good approach?

If most searches will be only on the articles added in the last month, would it be a good idea to keep two databases, one with the articles added in the last month for most searches and another with all the articles?

If you have any tips/tricks on how to improve this, it would be greatly appreciated.

Thanks in advance!

506

asked Feb 13 '12 11:02

Stelian Matei

1 Answers

I think your solution is quite good. I would evaluate putting a memcache instance before SOLR if you want to get faster responses on common queries.

I am not sure about the two databases, you would have to see what's the performance benefit compared to the burden of moving records from the first to the second DB as they age. I doubt there is a huge benefit, but that is just gut feeling, don't take my words and run experiments.

Also, are you considering the fact that you may need some horizontal-scalable solution if your dataset becomes very large?

answered Sep 21 '22 21:09

Savino Sguera

Related questions
                            
                                Protocol buffers for serializing several data objects of a post/comment into a single serialized piece of data
                            
                                What concepts should I study to accomplish this?
                            
                                Rails 3 I18n for database tables
                            
                                How do big companies (like, say Facebook) do migrations without having downtime?
                            
                                Renaming SQLite Tables/Columns/Rows after indices have been created
                            
                                SQL find identical group
                            
                                Abstracting away database specific id:s with the repository pattern?
                            
                                Re-hashing a hashed password
                            
                                Drupal 7 - How to insert unix timestamp into database
                            
                                How many columns we can create in a table in SQLite?
                            
                                Simple search by value?
                            
                                SQL Select Rows from a table and update the same rows
                            
                                How to select return value from mysql prepared statement?
                            
                                SQL Uniform distribution of points
                            
                                Is there some data structure or database that can handle path expression statements and path expression queries?
                            
                                How to find out if store open or close - dealing with hours?
                            
                                how to map ordered list in nhibernate?
                            
                                How to log connection pool data using BoneCP
                            
                                Do database views affect query performance?
                            
                                MySQL concepts: session vs connection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

design very large database for searching text

Tags:

database

full-text-search

database-design

Stelian Matei

People also ask

1 Answers

Savino Sguera

Recent Activity

Donate For Us