Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sortiing String field alphabetically in Lucene 5.0

I'm having issues sorting on string fields in Lucene 5.0. Apparantly the way you could sort since Lucene 4 has changed. Below shows a snippet of some of the fields that are being index for my documents.

@Override
public Document generateDocument(Process entity)
{
    Document doc = new Document();
    doc.add(new IntField(id, entity.getID(), Field.Store.YES));
    doc.add(new TextField(title, entity.getProcessName(), Field.Store.YES));
    doc.add(new IntField(organizationID, entity.getOrganizationID(), Field.Store.YES));
    doc.add(new StringField(versionDate, DateTools.dateToString(entity.getVersionDate(), DateTools.Resolution.SECOND), Field.Store.YES));
    doc.add(new LongField(entityDate, entity.getVersionDate().getTime(), Field.Store.YES)); 
    return doc;
}

I would like to sort on relevance first, which works just fine. The issue I have is that sorting on the title field doesn't work. I've created a sortfield which i'm trying to use with a TopFieldCollector after a chain of method calls.

public BaseSearchCore<Process, ProcessSearchResultScore>.SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage) throws IOException, ParseException
    {
    SortField titleSort = new SortField(title, SortField.Type.STRING, true);
    return super.search(searchQuery, filter, page, hitsPerPage, title);
    }

Which goes to:

public SearchContainer search(String searchQuery, Filter filter, int page, int hitsPerPage, SortField... sortfields) throws IOException, ParseException 
    {
        Query query = getQuery(searchQuery);
        TopFieldCollector paginate = getCollector(sortfields);
        int startIndex = (page -1) * hitsPerPage;
        ScoreDoc[] hits = executeSearch(query, paginate, filter, startIndex, hitsPerPage);

        return collectResults(query, filter, hitsPerPage, hits, page);
  }

And finally to the method that applies the sort field:

private TopFieldCollector getCollector(SortField sortfield) throws IOException
    {
        SortField[] sortFields = new SortField[] {SortField.FIELD_SCORE, sortField};
        Sort sorter = new Sort(sortFields);
        TopFieldCollector collector = TopFieldCollector.create(sorter, 25000, true, false, true);
        return collector;
    }

Using the returned collector a regular query is performed, and a result is returned. However, if I try to sort with this SortField i'll get this exception:

java.lang.IllegalStateException: unexpected docvalues type NONE for field 'title' (expected=SORTED). Use UninvertingReader or index with docvalues.

How am I supposed to index a string field to be able to sort it alphabetically(using sortfields) in Lucene 5? Any code examples or snippets would be much appriciated.

Searching by relevancy works just fine, but when users enter empty search queries all the results have the same relevancy. With those queries I'd rather sort by the results titles, which is causing issues in this iteration of Lucene.

like image 901
Muppenz Avatar asked Apr 17 '15 09:04

Muppenz


2 Answers

While indexing use this for sorting in Lucene 5.0 and above:

doc.add(new SortedDocValuesField("title", new BytesRef(term)));

For searching use:

Sort sort = new Sort();
sort.setSort(new SortField("title", SortField.Type.STRING));            
TopDocs hits = searcher.search(bQuery.build(), pageSize, sort);
like image 44
Surya narayana Avatar answered Sep 20 '22 09:09

Surya narayana


A note: It's way easier to figure out bugs (both for yourself and for the people you're asking) if you try to boil it down to the smallest example that you can first. Rather than sort through your architecture, and classes I don't have access to or know anything about, and such, I'll be addressing the problem as reproduced by this:

Sort sort = new Sort(new SortField("title", SortField.Type.STRING));
TopDocs docs = searcher.search(new TermQuery(new Term("title", "something")), 10, sort);

Where title is defined something like:

doc.add(new TextField("title", term, Field.Store.YES));

The best approach to sorting fields here is probably going to be to take the advice on docvalues. Adding DocValues to the field is essentially indexing it for sorting, and is much more efficient the typical sorting method in Lucene 4.X, as I understand it. Adding both the typical TextField and the SortedDocValuesField to the same field (name) seems to work rather well, and supports both searching and sorting with the same field name:

doc.add(new TextField("title", term, Field.Store.YES));
doc.add(new SortedDocValuesField("title", new BytesRef(term)));
like image 88
femtoRgon Avatar answered Sep 20 '22 09:09

femtoRgon