Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Problem with faceted search

Tags:

solr

I’m doing some faceted searches but have a few problems. I don’t get the desired results when there are several words in the faceted search field.

Example: “animal” field with the following entries:

        A horse

        Black horse

        Black horse

La faceted search sends back "horse(3)" as best result, whereas I would like to get back "Black horse(2)".

And this is the schema.xml. The search field is BUSQUEDA, and the faceted field is SUPERFICIE. I think I have tried most of the posible combinations of the defined types for these two fields but still doesn't work.

<?xml version="1.0" encoding="UTF-8" ?>
        <schema name="example" version="1.2">
         <types>

     <fieldType name="string" class="solr.StrField"/>

    <fieldType name="facet_texPersonal" class="solr.StrField" sortMissingLast="true" omitNorms="true">
           <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
           </analyzer>
          </fieldType>

          <fieldType name="facet_tex" class="solr.TextField" sortMissingLast="true" omitNorms="true">
           <analyzer>
            <tokenizer class="solr.KeywordTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory" />
            <filter class="solr.TrimFilterFactory" />
           </analyzer>
          </fieldType>

          <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
           <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
             enablePositionIncrements="true"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
             catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
           </analyzer>
           <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
             enablePositionIncrements="true"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
             catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
           </analyzer>
          </fieldType>

          <fieldType name="textTight" class="solr.TextField" positionIncrementGap="100" >
            <analyzer>
           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
           <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
           <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
           <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0"        catenateWords="1" catenateNumbers="1" catenateAll="0"/>
           <filter class="solr.LowerCaseFilterFactory"/>
           <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
           <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
          </fieldType>

          <fieldType name="textMultidioma" class="solr.TextField" positionIncrementGap="100">
           <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" 
              enablePositionIncrements="true" />
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
              catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>
           </analyzer>
           <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" 
             catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>
           </analyzer>
          </fieldType>

         </types>

         <fields>
          <field name="BUSQUEDA" type="facet_tex" indexed="true" stored="true"/>
          <field name="SUPERFICIE" type="facet_tex" indexed="true" stored="true"/>
          <field name="NOMBRE" type="string" indexed="true" stored="true"/>
         </fields>
         <uniqueKey>NOMBRE</uniqueKey>
         <defaultSearchField>BUSQUEDA</defaultSearchField></schema>

Any suggestions?

Thanks a bunch in advance!

like image 721
Carlos Avatar asked Feb 08 '10 16:02

Carlos


People also ask

What is multifaceted search?

Faceted search uses product or content features as criteria for a website visitor to refine their search results. Your user will get specific and relevant options to filter their result page. This makes faceted search an easy and practical way to search for products or pages.

Which are advantages of faceted browsing over hierarchical search?

Faceted search allows users to reduce the number of search results quickly to get to the desired item(s). Showing narrowing options (facets) is easier for users because they don't have to know the syntax necessary to specify their search precisely.

What does facet mean in search?

Faceted search is a technique that involves augmenting traditional search techniques with a faceted navigation system, allowing users to narrow down search results by applying multiple filters based on faceted classification of the items. It is sometimes referred to as a parametric search technique.

What is a faceted URL?

Faceted navigation systems operate by creating a new URL for every filtered search. They'll either dynamically generate the URL, creating something like the one used in our example. Or they'll append parameters that specify how the category URL is behaving (more on this later).


2 Answers

You have to facet on a non-tokenized field (field class solr.StrField, or using solr.KeywordTokenizerFactory). This thread explains it in detail.

like image 129
Mauricio Scheffer Avatar answered Oct 23 '22 05:10

Mauricio Scheffer


We had multi-word faceted fields working for a project that I worked on previously. Here is (part of) the schema.xml relating to this:

<schema name="example" version="1.2">
 <types>
  <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true" />
    ...
 </types>  
 <fields>
  <field name="grant_type" type="string" indexed="true" stored="true" />
  ...
 </fields>
</schema>

As Mauricio has highlighted the facet field has to be non-tokenized (not split in to separate words). In the config above we are using the 'solr.StrField' (non-tokenized) field type.

Further hints for faceted field types (not converting to lowercase, not stripping out punctuation, etc.) can be found on the Solr Faceting Overview page.

like image 40
Jonathan Williams Avatar answered Oct 23 '22 04:10

Jonathan Williams