Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to take advantage of Sphinx multiple indexes to improve performance

I am reading a book on Sphinx and it mentions that in order to take advantage of multiple cores and the Sphinx technology itself, I'll inevitably have to split a big index into smaller ones and query them in a multi-index query. However the book doesn't go into any further details.

What are the general strategies for this? Do you simply split it in a UNION-like way, e.g.

index1: SELECT ... FROM table LIMIT 0, 1000
index2: SELECT ... FROM table LIMIT 1000, 1000
...

And then you rebuild these pieces from time to time. When search is made different cores will process these indexes in parallel? Or is it something different like separating existing items in bigger index and newer items that are being added into a smaller index? Or separating text fields into one index and attributes into another?

like image 639
Eugene Avatar asked Jun 03 '12 22:06

Eugene


1 Answers

Great question.

Sphinx is utilizing one CPU core per single local index search and one CPU core for building one index while indexing

If you have two indexes you could run two indexers at the same time and utilize two CPU cores. Please beware that indexing is IO intensive task so don't run too many indexers.

Once you have two (or more) indexes you could search them at the same time by mentioning all of them in search query or using distributed index like this:

index index_main
{
        type            = distributed
        local           = index1
        local           = index2
}

where index1 and index2 are separate indexes. In this case you can search against index_main and sphinx will provide you aggregated results from both indexes

Regarding splitting data you could utilize all the techniques you want including splitting records by range, by hash or by attribute value and all the above in any combination.

My favorite one is to use modulo to determine index number like this:

For the first index:

sql_query       = SELECT id, title, description FROM <my_table> WHERE (id % 2) = 0

For second:

sql_query       = SELECT id, title, description FROM <my_table> WHERE (id % 2) = 1

This method has some drawbacks but in general it's a good start if you don't have lots of data.

like image 91
vfedorkov Avatar answered Oct 06 '22 00:10

vfedorkov