Elasticsearch: Get phrase frequency in a given document

Question

Test data:

curl -XPUT 'localhost:9200/customer/external/1?pretty' -d '{ "body": "this is a test" }'
curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "and this is another test" }'
curl -XPUT 'localhost:9200/customer/external/2?pretty' -d '{ "body": "this thing is a test" }'

My goal is to get the frequency of a phrase in a document.

I know how to get the frequency of the terms in a document:

curl -g "http://localhost:9200/customer/external/1/_termvectors?pretty" -d'
{
        "fields": ["body"],
        "term_statistics" : true
}'

And I know how to count the documents that contains a given phrase (with match_phrase or span_near query):

curl -g "http://localhost:9200/customer/_count?pretty" -d'
{
  "query": {
    "match_phrase": {
      "body" : "this is"
      }
    }    
}'

How can I access the frequency of a phrase ?

Lupanoide · Accepted Answer

You can use termvectors. As written in documentation

Return values edit

Three types of values can be requested: term information, term statistics and field statistics. By default, all term information and field statistics are returned for all fields but no term statistics. Term information edit
term frequency in the field (always returned)
term positions (positions : true)
start and end offsets (offsets : true)
term payloads (payloads : true), as base64 encoded bytes

you have to reach term frequency - in the example you can see that there is the frequency for john doe in doc. Pay attention that termvector duplicate the disk space occupation for the field on which it is applied

Elasticsearch: Get phrase frequency in a given document

Tags:

elasticsearch

elasticsearch-5

Gilles Cuyaubere

1 Answers

Lupanoide

Recent Activity

Donate For Us

Elasticsearch: Get phrase frequency in a given document

Tags:

elasticsearch

elasticsearch-5

Gilles Cuyaubere

1 Answers

Lupanoide

Related questions

Recent Activity

Donate For Us