Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sphinx without using an auto_increment id

I am current in planning on creating a big database (2+ million rows) with a variety of data from separate sources. I would like to avoid structuring the database around auto_increment ids to help prevent against sync issues with replication, and also because each item inserted will have a alphanumeric product code that is guaranteed to be unique - it seems to me more sense to use that instead.

I am looking at a search engine to index this database with Sphinx looking rather appealing due to its design around indexing relational databases. However, looking at various tutorials and documentation seems to show database designs being dependent on an auto_increment field in one form or another and a rather bold statement in the documentation saying that document ids must be 32/64bit integers only or things break.

Is there a way to have a database indexed by Sphinx without auto_increment fields as the id?

like image 679
squeeks Avatar asked Oct 29 '09 16:10

squeeks


4 Answers

Sure - that's easy to work around. If you need to make up your own IDs just for Sphinx and you don't want them to collide, you can do something like this in your sphinx.conf (example code for MySQL)

source products {

  # Use a variable to store a throwaway ID value
  sql_query_pre = SELECT @id := 0 

  # Keep incrementing the throwaway ID.
  # "code" is present twice because Sphinx does not full-text index attributes
  sql_query = SELECT @id := @id + 1, code AS code_attr, code, description FROM products

  # Return the code so that your app will know which records were matched
  # this will only work in Sphinx 0.9.10 and higher!
  sql_attr_string = code_attr  
}

The only problem is that you still need a way to know what records were matched by your search. Sphinx will return the id (which is now meaningless) plus any columns that you mark as "attributes".

Sphinx 0.9.10 and above will be able to return your product code to you as part of the search results because it has string attributes support.

0.9.10 is not an official release yet but it is looking great. It looks like Zawodny is running it over at Craig's List so I wouldn't be too nervous about relying on this feature.

like image 193
outcassed Avatar answered Nov 01 '22 14:11

outcassed


sphinx only requires ids to be integer and unique, it doesn't care if they are auto incremented or not, so you can roll out your own logic. For example, generate integer hashes for your string keys.

like image 37
user187291 Avatar answered Nov 01 '22 13:11

user187291


Sphinx doesnt depend on auto increment , just needs unique integer document ids. Maybe you can have a surrogate unique integer id in the tables to work with sphinx. As it is known that integer searches are way faster than alphanumeric searches. BTW how long is ur alphanumeric product code? any samples?

like image 1
Sabeen Malik Avatar answered Nov 01 '22 13:11

Sabeen Malik


I think it's possible to generate a XML Stream from your data. Then create the ID via Software (Ruby, Java, PHP).

Take a look at http://github.com/burke/mongosphinx

like image 1
chris Avatar answered Nov 01 '22 15:11

chris