Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct Solr fieldType to use for sorting integer values?

I am using Solr 3.6.1. What is the correct field type to use for a Solr sort field containing integer values? I need this field only for sorting and will never do range queries on it. Should I use integer or sint?

I see that in schema.xml, there is sint type declared as:

 <!-- Numeric field types that manipulate the value into
         a string value that isn't human-readable in its internal form,
         but with a lexicographic ordering the same as the numeric ordering,
         so that range queries work correctly. -->
    <fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>

whereas integer says the following:

 <!-- numeric field types that store and index the text
         value verbatim (and hence don't support range queries, since the
         lexicographic ordering isn't equal to the numeric ordering) -->
    <fieldType name="integer" class="solr.IntField" omitNorms="true"/>

The main reason I am asking this is because every Solr sort I do on an sint field (I have lots of them declared as dynamic fields) populates the (unconfigurable) lucene fieldCache. I see on the stats page (http://HOST:PORT/solr/CORE/admin/stats.jsp) under fieldCache that sint sorts are stored as

org.apache.lucene.search.FieldCache$StringIndex

whereas integer sorts are stored as

org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER

which I believe consumes less space?


UPDATE: Solr 3.6.1 schema.xml has int declared as TrieIntField i.e. as

<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>

The one above was from an older solr version.

like image 969
arun Avatar asked Nov 14 '12 02:11

arun


1 Answers

If you don't need range queries, use "integer" as Sorts work correctly on both

Documentation :-

Sortable FieldTypes like sint, sdouble are a bit of a misnomer. They are not needed for Sorting in the sense described above, but are needed when doing RangeQuery queries. Sortables, in fact, refer to the notion of making the number sort correctly lexicographically as Strings. That is, if this is not done, the numbers 1..10 sort lexicographically as 1,10, 2, 3... Using an sint, however remedies this. If, however, you don't need to do RangeQuery queries and only need to sort on the field, then just use an int or double or the equivalent appropriate class. You will save yourself time and memory.

like image 113
Jayendra Avatar answered Sep 19 '22 02:09

Jayendra