Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full-text indexing sluggish. Looking for alternatives

I have a table that I've created a Full Text Catalog on. The table has just over 6000 rows. I've added two columns to the index. The first could be considered a unique identifier of sorts and the second could be considered the content for that item (there are 11 other columns in my table that aren't part of the Full Text Catalog). Here is an example of a couple of rows:

TABLE: data_variables
ROW    unique_id    label
1      A100d1       Personal preference of online shopping sites
2      A100d2       Shopping behaviors for adults in household

In my web application on the front end, I have a text box that the user can type into to get a list of items that match whatever terms they're searching for in the UNIQUE ID or LABEL columns. So, for example, if the user typed in sho or a100 then a list would be populated with both of the rows above. If they typed in behav then a list would be populated with only row 2 above.

This is done via an Ajax request on each keyup. PHP calls a Stored Procedure on the SQL server that looks like:

SELECT TOP 50 dv.id, dv.id + ': ' + dv.label, 
              dv.type_id, dv.grouping, dv.friendly_label
FROM          data_variables dv
WHERE         (CONTAINS((dv.unique_id, dv.label), @search))

(@search is the text from the user that is passed into the Stored Procedure.)

I've noticed that this gets pretty sluggish, especially when I wasn't using TOP 50 in the query.

What I'm looking for is a way to speed this up either directly on the SQL Server or by abandoning the full-text indexing idea and using jQuery to search through an array of the searchable items on the client-side. I've looked a bit into the jQuery AutoComplete stuff and some other jQuery plugins for AutoComplete, but haven't yet tried to mock up anything. That would be my next step, but I wanted to check here first to see what advice I would get.

Thanks in advance.

like image 721
tptcat Avatar asked Feb 28 '13 15:02

tptcat


2 Answers

Several suggestions, based around the fact that you have only 6000 rows, so the database should eat this alive.

A. Try using Like operator, just in case it helps. Not expecting it too, but pretty trivial to try. There is something else going on here overall for you to detect this is slow given these small volumes.

B. can you cache queries in advance? With 6000 rows, there are probably only 36*36 combinations of 2 character queries, which should take virtually no memory and save the database any work.

C. Moving the selection out to the client is a good idea, depends on how big the 6000 rows are overall, vs network latency for individual lookups.

D. Combining b and c will give you really good performance I suspect, but with some coding effort required. If the server maintains a list of all single character results in cache, and clients download the letter cache set after first keystroke, then they potentially have a subset of all rows, but won't need to do more network IO for additional keystrokes.

like image 89
rlb Avatar answered Nov 05 '22 06:11

rlb


I would advise against a LIKE, unless you're using a linear index (left-to-right) and you're doing queries like LIKE 'work%'. If you're doing something like LIKE '%word%' a regular index isn't going to help you. You typically want to use a Full-Text index when you want to search for words inside a paragraph.

With a lot of data, typically the built-in Full-Text engines in databases aren't very stealer. For the best performance you typically have to go with an external solution that is built specifically for Full-Text.

Some options are Sphinx, Solr, and elasticsearch, just to name a few. I wouldn't say that any of these options are better than the other. There are definitely pros and cons to consider:

  • What kind of data do you have?
  • What language support do these solutions have?
  • What database engines do these solutions support?

The best thing you can do is benchmark these solutions against your existing data. Testing each and every individual component (unit testing) can help you identify the real problems and help you find good solutions.

like image 27
Luke Avatar answered Nov 05 '22 05:11

Luke