I use following filter in the schema.xml:
<filter class="solr.EdgeNGramFilterFactory" minGramSize="4" maxGramSize="15" side="front"/>
How can I boost the longer ngrams? For example, when I search for "bookpage", a document which contains "bookpage" should be rated a lot higher than a document with only "book".
I don't know of a way to dynamically boost based on term length (i.e., with a Function Query operator). I suspect there isn't one.
That said, I often want to approximate the logic you're looking for: longer term matches deserve a higher semantic weight.
Most commonly, I will index the text value into two different fields. One is a minimally-processed text field without ngrams. The other is similar, but also processed with ngrams.
Here are some sample excerpts of a schema that I have used in this fashion. For searches against this schema, I would boost the text field heavily over the text_ngram. Thus any matches against the text field would greatly influence the relevancy, while matches against text_ngram can still pick up perhaps-relevant results as well.
<?xml version="1.0" encoding="UTF-8"?>
<schema name="Sunspot Customized NZ" version="1.0">
  <types>
    <!--
      A text type with minimal text processing, for the greatest semantic
      value in a term match. Boost this field heavily.
    -->
    <fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.StandardFilterFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>
    </fieldType>
    <!--
      Looser matches with NGram processing for substrings of terms and synonyms
    -->
    <fieldType name="text_ngram" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory" />
        <filter class="solr.StandardFilterFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="6" side="front" />
      </analyzer>
    </fieldType>
    <!-- other stuff -->
  </types>
  <fields>
    <!-- id, other scalar values -->
    <!-- catch-all for the text and text_ngram types -->
    <field name="text"       stored="false" type="text"        multiValued="true"  indexed="true" />
    <field name="text_ngram" stored="false" type="text_ngram"  multiValued="true"  indexed="true" />
    <!-- various dynamicField definitions -->
    <!-- sample dynamicField definitions for text and text_ngram -->
    <dynamicField name="*_text"   type="text" indexed="true" stored="false" multiValued="false" />
    <dynamicField name="*_text_ngram"   type="text_ngram" indexed="true" stored="false" multiValued="false" />
  </fields>
  <!-- copy text fields into my text and text_ngram catch-all fields -->
  <copyField source="*_text"  dest="text" />
  <copyField source="*_text"  dest="text_ngram" />
</schema>
This isn't exactly what you're looking for, but you could use a similar approach.
For example, create a small collection of intermediate NGram-processed field types -- say, of length 1-3, 4-6, 7-9 -- and give them increased boosts accordingly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With