Nowadays when starting a web/mobile app project in which search is going to be an important variable. Is it better to go with Lucene from the start or quickly deploy a MySQL based solution and hope for the best?
I had the same decision in November 2010. I'm a friend of mysql and tried to build an search application on mysql first - which works well... ...and fast (i thought it was fast): searching in 200.000 documents (in not more than 2-3 second)
I avoided spending time to lucene/solr, because i would like to use this time for developing the application. And: lucene was new for me... I don't know, if it is good enough, i don't know what it is.... Finally: You can't change the habits of a lifetime.
However, i run in different problems with fuzzy search (which is difficult to implement in mysql) or "more like this" (which have to be coded from scrat in an application using mysql or simple use that "more like this" solr-feature out of the box). Finally the number of documents rises up to a million and mysql needs now more than 15 seconds to search into the documents.
So i decided to start with lucene and it feels like i opened a door to a new world. Lot's of features (i hardly coded application-features) are now provided from solr and working out of the box. The fulltext searches are much, much faster: less than 50ms in 1 million Documents, less than 1ms, if it is cached.
So the invested time has paid off.
So if you think about to make an fulltext search: take lucene, if you have mor than a couple of data. By the way: i'm using an hybrid construct: holding the data in mysql and lucene is only an index with (nearly) no stored data (to keep that index small and fast).
generically speaking, if you are going to have full text searches, you will most surely need lucene or sphinx + mysql (or lucene + mysql, storing the indexable fields in lucene, and returning an id for a mysql row). either of them are excellent choices.
if you are going to do "normal" searches (i.e: integer or char columns or date), mysql partitoning will suffice.
you need to specify what are you going to search for. and how often you will be reindexing your db (if you are going to reindex a lot, i'd go with sphinx)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With