Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sphinx vs. MySql - Search through list of friends (efficiency/speed)

I'm porting my application searches over to Sphinx from MySQL and am having a hard time figuring this one out, or if it even needs to be ported at all (I really want to know if it's worth using sphinx for this specific case for efficiency/speed):

users
uid uname
  1    alex
  2    barry
  3    david

friends
uid | fid
  1     2
  2     1
  1     3
  3     1

Details are:
- InnoDB
- users: index on uid, index on uname
- friends: combined index on uid,fid

Normally, to search all of alex's friends with mysql:

$uid = 1
$searchstr = "%$friendSearch%";
$query = "SELECT f.fid, u.uname FROM friends f 
          JOIN users u ON f.fid=u.uid
          WHERE f.uid=:uid AND u.uname LIKE :friendSearch";
$friends = $dbh->prepare($query);
$friends->bindParam(':uid', $uid, PDO::PARAM_INT);
$friends->bindParam(':friendSearch', $searchstr, PDO::PARAM_STR);
$friends->execute();

Is it any more efficient to find alex's friends with sphinx vs mysql or would that be an overkill?
If sphinx would be faster for this as the list hits thousands of people, what would the indexing query look like? How would I delete a friendship that no longer exists with sphinx as well, can I have a detailed example in this case? Should I change this query to use Sphinx?

like image 719
Wonka Avatar asked Aug 01 '12 15:08

Wonka


1 Answers

Ok this is how I see this working.

I have the exact same problem with MongoDB. MongoDB "offers" searching capabilities but just like MySQL you should never use them unless you wanna be choked with IO, CPU and memory problems and be forced to use a lot more servers to cope with your index than you normally would.

The whole idea if using Sphinx (or another search tech) is to lower cost per server by having a performant index searcher.

Sphinx however is not a storage engine. It is not as simple to query exact relationships across tables, they have remmedied this a little with SphinxQL but due to the nature of the full text index it still doesn't do an integral join like you would get in MySQL.

Instead I would store the relationships within MySQL but have an index of "users" within Sphinx.

In my website I personally have 2 indexes:

  • main (houses users,videos,channels and playlists)
  • help (help system search)

These are delta updated once every minute. Since realtime indexes are still bit experimental at times and I personally have seen problems with high insertion/deletion rates I keep to delta updates. So I would use a delta index to update the main searchable objects of my site since this is less resource intensive and more performant than realtime indexes (from my own tests).

Do note inorder to process deletions and what not your Sphinx collection through delta you will need a killlist and certain filters for your delta index. Here is an example from my index:

source main_delta : main
{
    sql_query_pre = SET NAMES utf8
    sql_query_pre =
    sql_query = \
        SELECT id, deleted,  _id, uid, listing, title, description, category, tags, author_name, duration, rating, views, type, adult, videos, UNIX_TIMESTAMP(date_uploaded) AS date_uploaded \
        FROM documents \
        WHERE id>( SELECT max_doc_id FROM sph_counter WHERE counter_id=1 ) OR update_time >( SELECT last_index_time FROM sph_counter WHERE counter_id=1 )

    sql_query_killlist = SELECT id FROM documents WHERE update_time>=( SELECT last_index_time FROM sph_counter WHERE counter_id=1 ) OR deleted = 1
}

This processes deletions and additions once every minute which is pretty much realtime for a real web app.

So now we know how to store our indexes. I need to talk about the relationships. Sphinx (even though it has SphinxQL) won't do integral joins across data so I would personally recommend doing the relationship outside of Sphinx, not only that but as I said this relationship table will get high load so this is something that could impact the Sphinx index.

I would do a query to pick out all ids and using that set of ids use the "filter" method on the sphinx API to filter the main index down to specific document ids. Once this is done you can search in Sphinx as normal. This is the most performant method I have found to date of dealing with this.

The key thing to remember at all times is that Sphinx is a search tech while MySQL is a storage tech. Keep that in mind and you should be ok.

Edit

As @N.B said (which I overlooked in my answer) Sphinx does have SphinxSE. Although primative and still in sort of testing stage of its development (same as realtime indexes) it does provide an actual MyISAM/InnoDB type storage to Sphinx. This is awesome. However there are caveats (as with anything):

  • The language is primative
  • The joins are primative

However it can/could do the job your looking for so be sure to look into it.

like image 136
Sammaye Avatar answered Oct 05 '22 20:10

Sammaye