By default, Lucene returns the query results in the order of relevance (score). You can pass a sort field (or multiple), then the results get sorted by that field.
I am looking now for a nice solution to get the search results in random order.
The bad approach:
Of course I could take ALL results and then shuffle the collection, but in case of 5 Mio search results, that's not performing well.
The elegant paged approach:
With this approach you would be able to tell Lucene the following:
a) Give me results 1 to 10 out of 5Mio results in random order
b) Then give me 11 to 20 (based on the same random sequence used in a).
c) Just to clarify: If you call a) twice you get the same random elements.
How can you implement this approach??
Update Jul27 2012: Be aware that the solution described here for Lucene 2.9.x is not working properly. Using the RandomOrderScoreDocComparator
will result in having certain results twice in the resulting list.
You could write a custom FieldComparator
:
public class RandomOrderFieldComparator extends FieldComparator<Integer> {
private final Random random = new Random();
@Override
public int compare(int slot1, int slot2) {
return random.nextInt();
}
@Override
public int compareBottom(int doc) throws IOException {
return random.nextInt();
}
@Override
public void copy(int slot, int doc) throws IOException {
}
@Override
public void setBottom(int bottom) {
}
@Override
public void setNextReader(IndexReader reader, int docBase) throws IOException {
}
@Override
public Integer value(int slot) {
return random.nextInt();
}
}
This doesn't consume any I/O when shuffling the results. Here is my sample program that demonstrates how you use this:
public static void main(String... args) throws Exception {
RAMDirectory directory = new RAMDirectory();
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_33);
IndexWriter writer = new IndexWriter(
directory,
new IndexWriterConfig(Version.LUCENE_33, analyzer).setOpenMode(OpenMode.CREATE_OR_APPEND)
);
Document alice = new Document();
alice.add( new Field("name", "Alice", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( alice );
Document bob = new Document();
bob.add( new Field("name", "Bob", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( bob );
Document chris = new Document();
chris.add( new Field("name", "Chris", Field.Store.YES, Field.Index.ANALYZED) );
writer.addDocument( chris );
writer.close();
IndexSearcher searcher = new IndexSearcher( directory );
for (int pass = 1; pass <= 10; pass++) {
Query query = new MatchAllDocsQuery();
Sort sort = new Sort(
new SortField(
"",
new FieldComparatorSource() {
@Override
public FieldComparator<Integer> newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
return new RandomOrderFieldComparator();
}
}
)
);
TopFieldDocs topFieldDocs = searcher.search( query, 10, sort );
System.out.print("Pass #" + pass + ":");
for (int i = 0; i < topFieldDocs.totalHits; i++) {
System.out.print( " " + topFieldDocs.scoreDocs[i].doc );
}
System.out.println();
}
}
It yields up this output:
Pass #1: 1 0 2 Pass #2: 1 0 2 Pass #3: 0 1 2 Pass #4: 0 1 2 Pass #5: 0 1 2 Pass #6: 1 0 2 Pass #7: 0 2 1 Pass #8: 1 2 0 Pass #9: 2 0 1 Pass #10: 0 2 1
public class RandomOrderScoreDocComparator implements ScoreDocComparator {
private final Random random = new Random();
public int compare(ScoreDoc i, ScoreDoc j) {
return random.nextInt();
}
public Comparable<?> sortValue(ScoreDoc i) {
return Integer.valueOf( random.nextInt() );
}
public int sortType() {
return SortField.CUSTOM;
}
}
All you have to change is the Sort
object:
Sort sort = new Sort(
new SortField(
"",
new SortComparatorSource() {
public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
return new RandomOrderScoreDocComparator();
}
}
)
);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With