I have a large dataset with 5M rows. One of the fields in the dataset is 'article_title', which I'd like to search in real-time for an autocomplete feature that I'm building on my site.
I've been experimenting with MySQL and MongoDB as potential DB solutions. Both perform well when an index is used, for example for 'something%', but I need to match titles within a string, as in '%something%'.
Both MySQL and MongoDB took 0.01 seconds with an index using forward-looking search, and about 6 seconds with a full string search.
I realize that the entire DB needs to be scanned for a string-in-string type search so what is the common approach to this problem? Solr and Sphinx seem like overkill for this one problem so I'm trying to avoid using them if possible.
If I got a box with 2 GB of RAM and a 40GB SSD (which is what I can afford at the moment), would I be able to get sub-second response time? Thanks in advance.
--
UPDATE: I tried a fulltext index and while the results are very fast, it doesn't really satisfy a string-in-string search ("presiden" doesn't match "president"). I'm looking for ways to match a string-in-string with a 5M row dataset.
In the case of MySQL, you can create a full-text index. To put it simply, a full-text index makes partial text matches fast by indexing each word. To create an index you would write:
alter table YourTable add fulltext index(article_title);
After that you can search with:
select * from YourTable where match(article_title) against ('something');
It seems that MongoDB also has text indexes. I imagine the indexing can be fine-tuned in either case, so you'll have to test which is better for your case.
When using a regular index, which is typically implemented as a BTREE, the index works from left-to-right. So a query like something%
will work because the left-side of the index can be used. With a query like %something
or %something%
such an index cannot be used.
A Full-Text index is different in that it indexes uncommon words. Common words (stop-words), like the
for example, are excluded. MySQL full-text index also leaves out words that are 3 characters or smaller.
For small cases the built-in Full-Text index will work just fine. The built-in full text indexes usually only take you so far though, so at some point you may need to use a dedicated solution, like Elastic Search or Spynx.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With