Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to print out the inverted index created by elasticsearch?

If I wanted to get all the tokens of the index that elasticsearch creates (I'm using the rails elasticsearch gem), how would I go about doing that? Doing something like this only gets a particular set of tokens for a search term:

curl -XGET 'http://localhost:9200/development_test/_analyze?text=John Smith'
like image 548
Nona Avatar asked Oct 09 '14 02:10

Nona


1 Answers

You can combine the Scroll API with the Term Vectors API to enumerate terms in the inverted index:

require "elastomer/client"
require "set"

client = Elastomer::Client.new({ :url => "http://localhost:9200" })
index = "someindex"
type = "sometype"
field = "somefield"

terms = Set.new

client.scan(nil, :index => index, :type => type).each_document do |document|
  term_vectors = client.index(index).docs(type).termvector({ :fields => field, :id => document["_id"] })["term_vectors"]
  if term_vectors.key?(field)
    term_vectors[field]["terms"].keys.each do |term|
      unless terms.include?(term)
        terms << term
        puts(term)
      end
    end
  end
end

This is rather slow and wasteful since it performs a _termvectors HTTP request for every single document in the index, holds all the terms in RAM, and keeps a scroll context open for the duration of enumeration. However, this doesn't require another tool like Luke and the terms can be streamed out of the index.

like image 101
Chris Wendt Avatar answered Oct 18 '22 03:10

Chris Wendt