Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

efficient serverside autocomplete

First off all I know:

Premature optimization is the root of all evil

But I think wrong autocomplete can really blow up your site.

I would to know if there are any libraries out there which can do autocomplete efficiently(serverside) which preferable can fit into RAM(for best performance). So no browserside javascript autocomplete(yui/jquery/dojo). I think there are enough topic about this on stackoverflow. But I could not find a good thread about this on stackoverflow (maybe did not look good enough).

For example autocomplete names:

names:[alfred, miathe, .., ..]

What I can think off:

  • simple SQL like for example: SELECT name FROM users WHERE name LIKE al%.
    • I think this implementation will blow up with a lot of simultaneously users or large data set, but maybe I am wrong so numbers(which could be handled) would be cool.
  • Using something like solr terms like for example: http://localhost:8983/solr/terms?terms.fl=name&terms.sort=index&terms.prefix=al&wt=json&omitHeader=true.
    • I don't know the performance of this so users with big sites please tell me.
  • Maybe something like in memory redis trie which I also haven't tested performance on.
  • I also read in this thread about how to implement this in java (lucene and some library created by shilad)

What I would like to hear is implementation used by sites and numbers of how well it can handle load preferable with:

  • Link to implementation or code.
  • numbers to which you know it can scale.
  • It would be nice if it could be accesed by http or sockets.

Many thanks,
Alfred

like image 356
Alfred Avatar asked Jan 08 '10 01:01

Alfred


1 Answers

Optimising for Auto-complete

Unfortunately, the resolution of this issue will depend heavily on the data you are hoping to query.

LIKE queries will not put too much strain on your database, as long as you spend time using 'EXPLAIN' or the profiler to show you how the query optimiser plans to perform your query.

Some basics to keep in mind:

  • Indexes: Ensure that you have indexes setup. (Yes, in many cases LIKE does use the indexes. There is an excellent article on the topic at myitforum. SQL Performance - Indexes and the LIKE clause ).

  • Joins: Ensure your JOINs are in place and are optimized by the query planner. SQL Server Profiler can help with this. Look out for full index or full table scans

Auto-complete sub-sets

Auto-complete queries are a special case, in that they usually works as ever decreasing sub sets.

  • 'name' LIKE 'a%' (may return 10000 records)
  • 'name' LIKE 'al%' (may return 500 records)
  • 'name' LIKE 'ala%' (may return 75 records)
  • 'name' LIKE 'alan%' (may return 20 records)

If you return the entire resultset for query 1 then there is no need to hit the database again for the following result sets as they are a sub set of your original query.

Depending on your data, this may open a further opportunity for optimisation.

like image 97
Jon Winstanley Avatar answered Oct 21 '22 16:10

Jon Winstanley