I am using Solr 3.6.1. What is the correct field type to use for a Solr sort field containing integer values? I need this field only for sorting and will never do range queries on it. Should I use integer
or sint
?
I see that in schema.xml, there is sint
type declared as:
<!-- Numeric field types that manipulate the value into
a string value that isn't human-readable in its internal form,
but with a lexicographic ordering the same as the numeric ordering,
so that range queries work correctly. -->
<fieldType name="sint" class="solr.SortableIntField" sortMissingLast="true" omitNorms="true"/>
whereas integer
says the following:
<!-- numeric field types that store and index the text
value verbatim (and hence don't support range queries, since the
lexicographic ordering isn't equal to the numeric ordering) -->
<fieldType name="integer" class="solr.IntField" omitNorms="true"/>
The main reason I am asking this is because every Solr sort I do on an sint
field (I have lots of them declared as dynamic fields) populates the (unconfigurable) lucene fieldCache. I see on the stats page (http://HOST:PORT/solr/CORE/admin/stats.jsp) under fieldCache that sint
sorts are stored as
org.apache.lucene.search.FieldCache$StringIndex
whereas integer
sorts are stored as
org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER
which I believe consumes less space?
UPDATE: Solr 3.6.1 schema.xml has int
declared as TrieIntField
i.e. as
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
The one above was from an older solr version.
If you don't need range queries, use "integer" as Sorts work correctly on both
Documentation :-
Sortable FieldTypes like sint, sdouble are a bit of a misnomer. They are not needed for Sorting in the sense described above, but are needed when doing RangeQuery queries. Sortables, in fact, refer to the notion of making the number sort correctly lexicographically as Strings. That is, if this is not done, the numbers 1..10 sort lexicographically as 1,10, 2, 3... Using an sint, however remedies this. If, however, you don't need to do RangeQuery queries and only need to sort on the field, then just use an int or double or the equivalent appropriate class. You will save yourself time and memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With