Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

using sphinx search with mongodb as datasource

Tags:

mongodb

sphinx

We decided to use mongodb for some web application (instead of mysql) but want to stay with sphinx for indexing/searching all data stored in mongodb. as the mongodb object-id is a hash per default -- and we want to stay with this -- now there's one problem in using sphinx. As it says in the sphinx documentation:

ALL DOCUMENT IDS MUST BE UNIQUE UNSIGNED NON-ZERO INTEGER NUMBERS (32-BIT OR 64-BIT, DEPENDING ON BUILD TIME SETTINGS).

so ... what's the best way to solve this problem ... how can we map the mongodb object-id to a non-zero integer (and back)?

UPDATE

casey's answer is the right direction to look into, however at it turns out string attributes are in the current dev-version only available for the sql datasource. for xmlpipe it's necessary to apply a patch to the checkout source. more information on this can be found in the sphinx forum.

like image 350
aurora Avatar asked Nov 05 '09 13:11

aurora


People also ask

Why use Sphinx search?

Sphinx is an open source search engine with fast full-text search capabilities. High speed of indexation, flexible search capabilities, integration with the most popular data base management systems (e.g. MySQL, PostgreSQL) and the support of various programming language APIs (e.g. for PHP, Python, Java, Perl, Ruby, .

What is Sphinx in database?

Sphinx (SQL Phrase Index) is a standalone full-text search engine that provides efficient search functionality to third party applications, especially SQL databases.

What is MongoDB used for?

MongoDB is a document database used to build highly available and scalable internet applications. With its flexible schema approach, it's popular with development teams using agile methodologies.


1 Answers

You can't use the object id as a Sphinx document id - MongoDB object IDs are bigger than the maximum size of Sphinx's document IDs.

Instead, you could increment a unique ID while generating the XML that Sphinx is going to process (I'm assuming you are using xmlpipe to get your Mongo data into Sphinx?) and store the MongoDB object ID as a string attribute in Sphinx.

You'll need the latest development version of Sphinx to do this - see my answer to this question for a little more detail: Sphinx without using an auto_increment id

like image 116
outcassed Avatar answered Oct 12 '22 12:10

outcassed