Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to query integers, floats in lucene and how to store (NumericComparator)?

Tags:

solr

lucene

A bigger question is will solr even be able to support this? I know I have seen lucene be able to do this and solr is built on lucene.

I have seen an example somewhere using google but can't seem to find it again, and the example was not complete in that I don't think it had the query portion on how I write my query statement for lucene. I remember seeing a NumericField and there is this NumericComparator.

Basically, I am trying a noSQL orm solution that offers indexing(on github) (though the client decides how many indexes per table and the partitioning methodology but you add entites to the index and remove them yourself and can use namedQueries though you have to get the index by name first before the query since one table may have millions of indexes). The two main things I want to achieve are that it all works with an in-memory nosql fake db and an in-memory index(lucene's RAMDirectory) AND then I want to switch those to plugging in cassandra and SOLR.

I basically need to

  1. figure out how to store integers, floats, etc.
  2. figure out how to write a lucene query when the targets are strings, floats, ints, etc.

Right now, if you need more details the main Query code of the project is found at https://github.com/deanhiller/nosqlORM/blob/master/input/javasrc/com/alvazan/orm/layer3/spi/index/inmemory/MemoryIndexWriter.java

and on line 172 you can see I am adding a new Field every time but unfortunately some of these may be ints.

BIG QUESTION: Can SOLR even support int vs. string? (IF not, I will have to go with the hack of padding 0's on the front of ints, longs etc. so all ints are the same length).

IF SOLR can support it, then in lucene what is the best way or is there a good example for this?

The main index interface retrieved from NoSqlEntityManager.getIndex(Class clazz, String indexPartitionName) is (though not sure it matters).. https://github.com/deanhiller/nosqlORM/blob/master/input/javasrc/com/alvazan/orm/api/Index.java

thanks, Dean

like image 865
Dean Hiller Avatar asked May 01 '12 23:05

Dean Hiller


2 Answers

From the example SOLR schema.xml file:

<!--
      Default numeric field types. For faster range queries, consider the tint/tfloat/tlong/tdouble types.
    -->
<fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
<fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>
<!--
     Numeric field types that index each value at various levels of precision
     to accelerate range queries when the number of values between the range
     endpoints is large. See the javadoc for NumericRangeQuery for internal
     implementation details.

     Smaller precisionStep values (specified in bits) will lead to more tokens
     indexed per value, slightly larger index size, and faster range queries.
     A precisionStep of 0 disables indexing at different precision levels.
    -->
<fieldType name="tint" class="solr.TrieIntField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tfloat" class="solr.TrieFloatField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tlong" class="solr.TrieLongField" precisionStep="8" positionIncrementGap="0"/>
<fieldType name="tdouble" class="solr.TrieDoubleField" precisionStep="8" positionIncrementGap="0"/>

So if you index a field as one of those fieldtypes above, then query it via its fieldname (e.g. myIntField:1234) it will do the "right thing" and you can also do range searches against it (myIntField:[1200 TO 1300]). Same goes for floats, etc.

like image 90
nickdos Avatar answered Nov 11 '22 14:11

nickdos


I think we can leverage org.apache.lucene.document.NumericField class. In this class, we can call set method, it can support int,log,float and double. For other data type (E.g. bool, datetime), we can do special convert to change them into int or long type.

BTW, I saw lucene's latest source code, involving new clases: FloatField, IntField, LongField adn DoubleField. It will be included in next release. http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/core/src/java/org/apache/lucene/document/

like image 41
Brian Ling Avatar answered Nov 11 '22 14:11

Brian Ling