Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a filter in Sphinx with text/string value

Tags:

php

sphinx

I have Sphinx Search installed as my search engine and I'm trying to add a few extra features to the search using setFilter() and SetSelect() which should allow me to do WHERE/AND clauses. But whenever I try a search, it returns no results instead of results.

Here is my sphinx.conf: http://pastebin.com/M6Kd71u0

And here's the PHP code:

require("sphinxapi.php");

$host = "localhost";
$port = 9312;
$index = "llgenre";
$select1 = "cartoon";
$label6 = "children";
$type = 4;
$limit = 20;
$ranker = SPH_RANK_PROXIMITY_BM25;
$mode = SPH_MATCH_ALL;

$sphinx = new SphinxClient();
$sphinx->setServer($host, $port);
$sphinx->setConnectTimeout(0);
$sphinx->setMatchMode($mode);
$sphinx->setRankingMode($ranker);
$sphinx->setSelect('*, select1="'.$select1.'" AND label6="'.$label6.'" AS mycond');
$sphinx->setFilter('mycond', array(1));

$res = $sphinx->query($type, $index);

die(var_dump($res));

How can I search by type = 4, filter by select1 with cartoon and finally on label6 with children?

like image 738
Roukmoute Avatar asked Apr 20 '11 20:04

Roukmoute


1 Answers

I believe what you're attempting to do is to filter strings as attributes. Referring to the Sphinx FAQ, they outline the procedure

How do I filter, sort, or group by string column without string attributes?

You can do all of this, except for precise arbtrary-length sorting over several indexes.

To filter and group, you can replace the string with an unique numeric ID. Sometimes its possible to create a lookup dictionary in the database (eg. for fixed lists of cities or countries), or even use an existing one, replace strings with their IDs in that dictionary, then filter and group on that ID. If not, you can always replace the string with its checksum, eg. CRC32() or (any) 64 bits taken from MD5() at indexing time (no need to alter the tables!), store it using sql_attr_uint or sql_attr_bigint respectively, and then filter or group on that checksum attribute. (Note that there's a certain chance of CRC32() collisions if you have millions of strings but practically zero chance of MD5() collisions.)

So, in my sphinx.conf, I might have the following...

sql_query = SELECT CRC32(string_field) AS `string_field` FROM `table`

sql_attr_uint = string_field

Then in PHP, I would apply a filter on the field like so...

$sphinx->SetFilter('string_field', array(crc32( 'filter_string' ));

--

Unfortunately, PHP has an annoying problem(bug?) when converting to crc32... something involving unsigned integers or something..

I use the following function to convert correctly

class Encode {
    public static function crc32($val){
        $checksum = crc32($val);
        if($checksum < 0) $checksum += 4294967296;
        return $checksum;
    }
}

--

Be careful of character case! You may choose to convert the column to lower case while indexing eg.

sql_query = SELECT CRC32(LOWER(string_field)) AS `string_field` FROM `table`

and searching...

$sphinx->SetFilter('string_field', array(crc32(strtolower( 'Filter_String' )));
like image 158
MeatFlavourDev Avatar answered Oct 14 '22 12:10

MeatFlavourDev