Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr: Search for hyphenated terms give 0 results

Tags:

solr

I am unable to retrieve hyphenated terms in my SOLR search results. For example, when I try to do a search like: superman, super man etc., I should see titles like super-man, super-man3 etc. in my search results.

The FieldType is as follows:

<fieldType name="autocomplete_edge" class="solr.TextField">
    <analyzer type="index">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt" />
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:-_])" replacement=" " replace="all" />
        <filter class="solr.EdgeNGramFilterFactory" maxGramSize="30" minGramSize="1" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all" />
    </analyzer>
    <analyzer type="query">
        <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt" />
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="([\.,;:-_])" replacement=" " replace="all" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d\*æøåÆØÅ ])" replacement="" replace="all" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="^(.{30})(.*)?" replacement="$1" replace="all" />
    </analyzer>
</fieldType> 
like image 606
anand tiwari Avatar asked Oct 21 '22 06:10

anand tiwari


1 Answers

I would suggest using WordDelimiterFilterFactory for your use case.

WordDelimiterFilterFactory would allow you to generate tokens that can be split on special characters and numbers and also maintain the Original so that it would match the search terms.

for e.g.
generateWordParts would convert super-man -> super, man
splitOnNumerics would generate super-man3 -> super, man, 3
catenateWords would convert super-man -> superman
catenateAll would convert super-man3 -> superman3

So this would provide you the ability to match the combination of the same words

like image 115
Jayendra Avatar answered Oct 25 '22 20:10

Jayendra