Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lucene 2.9.2: How to show results in random order?

By default, Lucene returns the query results in the order of relevance (score). You can pass a sort field (or multiple), then the results get sorted by that field.

I am looking now for a nice solution to get the search results in random order.

The bad approach:
Of course I could take ALL results and then shuffle the collection, but in case of 5 Mio search results, that's not performing well.

The elegant paged approach:
With this approach you would be able to tell Lucene the following:
a) Give me results 1 to 10 out of 5Mio results in random order
b) Then give me 11 to 20 (based on the same random sequence used in a).
c) Just to clarify: If you call a) twice you get the same random elements.

How can you implement this approach??


Update Jul27 2012: Be aware that the solution described here for Lucene 2.9.x is not working properly. Using the RandomOrderScoreDocComparator will result in having certain results twice in the resulting list.

like image 569
basZero Avatar asked Aug 26 '11 07:08

basZero


1 Answers

You could write a custom FieldComparator:

public class RandomOrderFieldComparator extends FieldComparator<Integer> {

    private final Random random = new Random();

    @Override
    public int compare(int slot1, int slot2) {
        return random.nextInt();
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        return random.nextInt();
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
    }

    @Override
    public void setBottom(int bottom) {
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase) throws IOException {
    }

    @Override
    public Integer value(int slot) {
        return random.nextInt();
    }

}

This doesn't consume any I/O when shuffling the results. Here is my sample program that demonstrates how you use this:

public static void main(String... args) throws Exception {
    RAMDirectory directory = new RAMDirectory();

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);

    IndexWriter writer = new IndexWriter(
            directory,
            new IndexWriterConfig(Version.LUCENE_33, analyzer).setOpenMode(OpenMode.CREATE_OR_APPEND)
        );

    Document alice = new Document();
    alice.add( new Field("name", "Alice", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( alice );

    Document bob = new Document();
    bob.add( new Field("name", "Bob", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( bob );

    Document chris = new Document();
    chris.add( new Field("name", "Chris", Field.Store.YES, Field.Index.ANALYZED) );
    writer.addDocument( chris );

    writer.close();


    IndexSearcher searcher = new IndexSearcher( directory );



    for (int pass = 1; pass <= 10; pass++) {
        Query query = new MatchAllDocsQuery();

        Sort sort = new Sort(
                new SortField(
                        "",
                        new FieldComparatorSource() {

                            @Override
                            public FieldComparator<Integer> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                                return new RandomOrderFieldComparator();
                            }

                        }
                    )
            );

        TopFieldDocs topFieldDocs = searcher.search( query, 10, sort );

        System.out.print("Pass #" + pass + ":");
        for (int i = 0; i < topFieldDocs.totalHits; i++) {
            System.out.print( " " + topFieldDocs.scoreDocs[i].doc );
        }
        System.out.println();
    }
}

It yields up this output:

Pass #1: 1 0 2
Pass #2: 1 0 2
Pass #3: 0 1 2
Pass #4: 0 1 2
Pass #5: 0 1 2
Pass #6: 1 0 2
Pass #7: 0 2 1
Pass #8: 1 2 0
Pass #9: 2 0 1
Pass #10: 0 2 1

Bonus! For those of you trapped in Lucene 2

public class RandomOrderScoreDocComparator implements ScoreDocComparator {

    private final Random random = new Random();

    public int compare(ScoreDoc i, ScoreDoc j) {
        return random.nextInt();
    }

    public Comparable<?> sortValue(ScoreDoc i) {
        return Integer.valueOf( random.nextInt() );
    }

    public int sortType() {
        return SortField.CUSTOM;
    }

}

All you have to change is the Sort object:

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new RandomOrderScoreDocComparator();
            }
        }
    )
);
like image 155
Adam Paynter Avatar answered Sep 28 '22 10:09

Adam Paynter