Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Whoosh index viewer

I'm using haystack with whoosh as backend for a Django app.

Is there any way to view the content (in a easy to read format) of the indexes generated by whoosh? I'd like to see what data was indexed and how so I can better understand how it works.

like image 745
daniels Avatar asked Mar 07 '10 08:03

daniels


People also ask

What is a whoosh filedb index?

A Whoosh filedb index is really a container for one or more “sub-indexes” called segments.

How does whoosh work with segments?

Then when you search the index, Whoosh searches both segments individually and merges the results so the segments appear to be one unified index. (This smart design is copied from Lucene.) So, having a few segments is more efficient than rewriting the entire index every time you add some documents.

What is the argument of a whoosh query?

The argument can be a whoosh.query.Query object, a whoosh.searching.Results object, or a set-like object containing document numbers.

How does whoosh decide which documents to collapse?

By default, Whoosh uses the results order (score or sort key) to determine the documents to collapse. For example, in scored results, the best scoring documents would be kept. You can optionally specify a collapse_order facet to control which documents to keep when collapsing.


2 Answers

You can do this pretty easily from python's interactive console:

>>> from whoosh.index import open_dir
>>> ix = open_dir('whoosh_index')
>>> ix.schema
<<< <Schema: ['author', 'author_exact', 'content', 'django_ct', 'django_id', 'id', 'lexer', 'lexer_exact', 'published', 'published_exact']>

You can perform search queries directly on your index and do all sorts of fun stuff. To get every document I could do this:

>>> from whoosh.query import Every
>>> results = ix.searcher().search(Every('content'))

If you wanted to print it all out (for viewing or whatnot), you could do so pretty easily using a python script.

for result in results:
    print "Rank: %s Id: %s Author: %s" % (result.rank, result['id'], result['author'])
    print "Content:"
    print result['content']

You could also return the documents directly from whoosh in a django view (for pretty formatting using django's template system perhaps): Refer to the whoosh documentation for more info: http://packages.python.org/Whoosh/index.html.

like image 119
zeekay Avatar answered Sep 19 '22 20:09

zeekay


from whoosh.index import open_dir
ix = open_dir('whoosh_index')
ix.searcher().documents()  # will show all documents in the index.
like image 20
Collin Anderson Avatar answered Sep 20 '22 20:09

Collin Anderson