Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python (official website).
But I cannot find any speed / performance comparison to other search engine, especially Lucene based (pyLucene, Lupyne...) ?
I'm used to use pyLucene which is known to be fast but quite non-pythonic and not easy to handle (direct java-Lucene wrapper). There is a pythonic wrapper of pyLucene; Lupyne. However this is not convenient when core features of Lucene are needed.
Any performance hints between Whoosh and other would be appreciate.
{1} Whoosh vs Xappy/Xapian
There are benchmarks for testing Python search supported by Whoosh and Xappy/Xapian here.
Whoosh Authors used those benchmarks to test whoosh against Xappy/Xapian (ref):
How the benchmark works
N documents are generated, the search word is a random word and 10 chars long, plus 10 extra fields with 100 chars of random stuff each (just to pump up the size of the document).
For indexing, all fields are indexed and stored.
For searching, all words are searched in random order and all stored fields are retrieved.
For whoosh, we used the multiprocessing writer for building the index - this explains why it is faster for indexing than xappy (because it used all 4 cores, not just 1).
For searching, xappy/xapian is faster (there was no parallel processing used). But you see that the speed difference between xappy and whoosh is maybe not as big as you expected.
Index Size about 12MB
# Phenom II X4 840, 8GB RAM, HDD
# Python 2.7.2+ (default, Oct 4 2011, 20:06:09)
# [GCC 4.6.1] on linux2
Params:
DOC_COUNT: 3000 WORD_LEN: 10
EXTRA_FIELD_COUNT: 10 EXTRA_FIELD_LEN: 100
Benchmarking: xappy 0.5 / xapian 1.2.5
Indexing takes 2.8s (1068.9/s)
Searching takes 0.5s (6635.8/s)
Benchmarking: whoosh 2.3.2
Indexing takes 0.8s (3575.6/s)
Searching takes 0.8s (3714.8/s)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With