I'm using haystack with whoosh as backend for a Django app.
Is there any way to view the content (in a easy to read format) of the indexes generated by whoosh? I'd like to see what data was indexed and how so I can better understand how it works.
A Whoosh filedb index is really a container for one or more “sub-indexes” called segments.
Then when you search the index, Whoosh searches both segments individually and merges the results so the segments appear to be one unified index. (This smart design is copied from Lucene.) So, having a few segments is more efficient than rewriting the entire index every time you add some documents.
The argument can be a whoosh.query.Query object, a whoosh.searching.Results object, or a set-like object containing document numbers.
By default, Whoosh uses the results order (score or sort key) to determine the documents to collapse. For example, in scored results, the best scoring documents would be kept. You can optionally specify a collapse_order facet to control which documents to keep when collapsing.
You can do this pretty easily from python's interactive console:
>>> from whoosh.index import open_dir
>>> ix = open_dir('whoosh_index')
>>> ix.schema
<<< <Schema: ['author', 'author_exact', 'content', 'django_ct', 'django_id', 'id', 'lexer', 'lexer_exact', 'published', 'published_exact']>
You can perform search queries directly on your index and do all sorts of fun stuff. To get every document I could do this:
>>> from whoosh.query import Every
>>> results = ix.searcher().search(Every('content'))
If you wanted to print it all out (for viewing or whatnot), you could do so pretty easily using a python script.
for result in results:
print "Rank: %s Id: %s Author: %s" % (result.rank, result['id'], result['author'])
print "Content:"
print result['content']
You could also return the documents directly from whoosh in a django view (for pretty formatting using django's template system perhaps): Refer to the whoosh documentation for more info: http://packages.python.org/Whoosh/index.html.
from whoosh.index import open_dir
ix = open_dir('whoosh_index')
ix.searcher().documents() # will show all documents in the index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With