Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing a massive search application

We have an email service that hosts close to 10000 domains such that we store the headers of messages in a SQL Server database.

I need to implement an application that will search the message body for keywords. The messages are stored as files on a NAS storage system.

As a proof of concept, I had implemented a SQL server based search system were I would parse the message and store all the words in a database table along with the memberid and the messageid. The database was on a separate server to the headers database.

The problem with that system was that I ended up with a table with 600 million rows after processing messages on just one domain. Obviously this is not a very scalable solution.

Since the headers are stored in a SQL Server table, I am going to need to join the messageIDs from the search application to the header table to display the messages that contain the searched for keywords.

Any suggestions on a better architecture? Any better alternative to using SQL server? We receive over 20 million messages a day. We are a small company with limited resources with respect to servers, maintenance etc.

Thanks

like image 239
klork Avatar asked May 23 '26 05:05

klork


1 Answers

have a look at Hadoop. It's complete "map-reduce" framework for working with huge datasets inspired by Google. It think (but I could be wrong) Rackspace is using it for email search for their clients.

like image 122
lubos hasko Avatar answered May 24 '26 21:05

lubos hasko



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!