Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get total count of each words in elasticsearch document?

I searched about the question but couldn't find any useful answer. I want to get the total count for each word in a document, for example I have some tweets in my indices and there is a tweet that says something like this “It is so boring here I want to go to my home sweet home”. The query should return the response like this:

It:1
is:1
so:1
boring:1
here:1
I:1
want:1
to:2
go:1
my:1
home:2
sweet:1

Is it possible to do that?

like image 569
Fatih Aktepe Avatar asked Aug 10 '15 07:08

Fatih Aktepe


1 Answers

You're looking for term vectors, which leverages analyzers. As as it do so, you can define any analyzer you need, i.e. stemming analyzer to transform words to root/normal form. Take a look at documentation for further details.

In:

POST so/_close
PUT so/_settings
{
  "settings": {
    "analysis":{ 
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "my_stemmer"]
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  }
}
POST so/_open
PUT so/t1/_mapping
{
  "t1": {
    "properties": {
      "tweet": {
        "type": "string",
        "store": true,
        "index_analyzer": "my_analyzer"
      }
    }
  }
}
POST so/t1/1
{"tweet": "It is so boring here I want to go to my home sweet home. So I'm bored"}

Out:

{
   "_index": "so",
   "_type": "t1",
   "_id": "1",
   "_version": 2,
   "found": true,
   "term_vectors": {
      "tweet": {
         "field_statistics": {
            "sum_doc_freq": 13,
            "doc_count": 1,
            "sum_ttf": 17
         },
         "terms": {
            "bore": {
               "term_freq": 2,
               ...
            },
            "go": {
               "term_freq": 1,
               ...
            },
            "here": {
               "term_freq": 1,
               ...
            },
            "home": {
               "term_freq": 2,
               ...
            },
            "i": {
               "term_freq": 1,
               ...
            },
            "i'm": {
               "term_freq": 1,
               ...
            },
            "is": {
               "term_freq": 1,
               ...
            },
            "it": {
               "term_freq": 1,
               ...
            },
            "my": {
               "term_freq": 1,
               ...
            },
            "so": {
               "term_freq": 2,
               ...
            },
            "sweet": {
               "term_freq": 1,
               ...
            },
            "to": {
               "term_freq": 2,
               ...
            },
            "want": {
               "term_freq": 1,
               ...
            }
         }
      }
   }
}
like image 136
Slam Avatar answered Nov 15 '22 10:11

Slam