Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How fast is Whoosh?

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python (official website).

But I cannot find any speed / performance comparison to other search engine, especially Lucene based (pyLucene, Lupyne...) ?

I'm used to use pyLucene which is known to be fast but quite non-pythonic and not easy to handle (direct java-Lucene wrapper). There is a pythonic wrapper of pyLucene; Lupyne. However this is not convenient when core features of Lucene are needed.

Any performance hints between Whoosh and other would be appreciate.

like image 889
dtrckd Avatar asked Mar 17 '15 15:03

dtrckd


1 Answers

{1} Whoosh vs Xappy/Xapian

There are benchmarks for testing Python search supported by Whoosh and Xappy/Xapian here.

Whoosh Authors used those benchmarks to test whoosh against Xappy/Xapian (ref):

How the benchmark works

N documents are generated, the search word is a random word and 10 chars long, plus 10 extra fields with 100 chars of random stuff each (just to pump up the size of the document).

For indexing, all fields are indexed and stored.

For searching, all words are searched in random order and all stored fields are retrieved.

For whoosh, we used the multiprocessing writer for building the index - this explains why it is faster for indexing than xappy (because it used all 4 cores, not just 1).

For searching, xappy/xapian is faster (there was no parallel processing used). But you see that the speed difference between xappy and whoosh is maybe not as big as you expected.

Index Size about 12MB

# Phenom II X4 840, 8GB RAM, HDD
# Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
# [GCC 4.6.1] on linux2

Params:
DOC_COUNT: 3000 WORD_LEN: 10
EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100

Benchmarking: xappy 0.5 / xapian 1.2.5
Indexing takes 2.8s (1068.9/s)
Searching takes 0.5s (6635.8/s)

Benchmarking: whoosh 2.3.2
Indexing takes 0.8s (3575.6/s)
Searching takes 0.8s (3714.8/s)
like image 108
Assem Avatar answered Nov 02 '22 14:11

Assem