This question feels very similar to an old question posted here: Retrieve analyzed tokens from ElasticSearch documents, but to see if there are any changes I thought it would make sense to post it again for the latest version of ElasticSearch. We are trying to search bodies of text in ElasticSearch with the search-query and field-mapping using the snowball stemmer built into ElasticSearch. The performance and results are great, but because we need to have the stemmed text-body for post-analysis we would like to have the search result return the actual stemmed tokens for the text-field per document in the search results. The mapping for the field currently looks like: <pre class="prettyprint"><code> "TitleEnglish": { "type": "string", "analyzer": "standard", "fields": { "english": { "type": "string", "analyzer": "english" }, "stemming": { "type": "string", "analyzer": "snowball" } } } </code></pre> and the search query is performed specifically on TitleEnglish.stemming. Ideally I would like it to return that field, but returning that does not return the analyzed field but the original field. Does anybody know of any way to do this? We have looked at Term Vectors, but they only seem to be returnable for individual documents or a body of documents, not for a search result? Or perhaps other solutions like Solr or Sphinx do offer this option? <hr> To add some extra information. If we run the following query: <pre class="prettyprint"><code>GET /_analyze?analyzer=snowball&text=Eight issue of Industrial Lorestan eliminate barriers to facilitate the Committees review of </code></pre> It returns the stemmed words: <code>eight</code>, <code>issu</code>, <code>industri</code>, etc. This is exactly the result we would like back for each matching document for all of the words in the text (so not just the matches).

Unless I'm missing something evident, why not simply returning a <code>terms</code> aggregation on the <code>TitleEnglish.stemming</code> field? <pre class="prettyprint"><code>{ "query": {...}, "aggs" : { "stems" : { "terms" : { "field" : "TitleEnglish.stemming", "size": 50 } } } } </code></pre> Adding that aggregation to your query, you'd get a breakdown of all the stemmed terms in the <code>TitleEnglish.stemming</code> sub-field from the documents that matched your query.

Is it possible to returned the analyzed fields in an ElasticSearch >2.0 search?

Tags:

lucene

elasticsearch

nlp

This question feels very similar to an old question posted here: Retrieve analyzed tokens from ElasticSearch documents, but to see if there are any changes I thought it would make sense to post it again for the latest version of ElasticSearch.

We are trying to search bodies of text in ElasticSearch with the search-query and field-mapping using the snowball stemmer built into ElasticSearch. The performance and results are great, but because we need to have the stemmed text-body for post-analysis we would like to have the search result return the actual stemmed tokens for the text-field per document in the search results.

The mapping for the field currently looks like:

      "TitleEnglish": {
        "type": "string",
        "analyzer": "standard",
        "fields": {
          "english": {
            "type": "string",
            "analyzer": "english"
          },
          "stemming": {
            "type": "string",
            "analyzer": "snowball"
          }
        }
      }

and the search query is performed specifically on TitleEnglish.stemming. Ideally I would like it to return that field, but returning that does not return the analyzed field but the original field.

Does anybody know of any way to do this? We have looked at Term Vectors, but they only seem to be returnable for individual documents or a body of documents, not for a search result?

Or perhaps other solutions like Solr or Sphinx do offer this option?

To add some extra information. If we run the following query:

GET /_analyze?analyzer=snowball&text=Eight issue of Industrial Lorestan eliminate barriers to facilitate the Committees review of

It returns the stemmed words: eight, issu, industri, etc. This is exactly the result we would like back for each matching document for all of the words in the text (so not just the matches).

874

asked Mar 16 '16 11:03

luckylwk

1 Answers

Unless I'm missing something evident, why not simply returning a terms aggregation on the TitleEnglish.stemming field?

{
    "query": {...},
    "aggs" : {
        "stems" : {
            "terms" : { 
                "field" : "TitleEnglish.stemming",
                "size": 50
            }
        }
    }
}

Adding that aggregation to your query, you'd get a breakdown of all the stemmed terms in the TitleEnglish.stemming sub-field from the documents that matched your query.

answered Sep 19 '22 14:09

Val

Related questions
                            
                                Specifying and using a NGramTokenizer with the C# NEST client for Elastic Search
                            
                                Elasticsearch query performance
                            
                                Does not work autocomplete with EdgeNgramField using haystack and engine Elasticsearch (Django)
                            
                                Elasticsearch get matching documents after specific document id
                            
                                mapping for objects with an arbitrary amount of properties in elasticsearch
                            
                                Using aggregation functions in Elasticsearch queries
                            
                                Elasticsearch - do not map the fields by default
                            
                                Elasticsearch too many running threads
                            
                                Format the output of elasticsearch-py
                            
                                Elasticsearch sort by dynamic price
                            
                                ES-head plugin not working through browser
                            
                                How to update/replace a field in an ElasticSearch document using PHP?
                            
                                Use ElasticSearch with Dropwizard
                            
                                How to print the full elasticsearch request for debug in java
                            
                                Elasticsearch rails/ Elasticsearch model search model association
                            
                                Limit and Offset in Term Aggregation ElasticSearch
                            
                                How do I use Elasticsearch's geo_point and geo_shape types at the same time?
                            
                                elastic search array score
                            
                                What is the best way to implement Email Alerts in Elastisearch?
                            
                                deleting all documents with out dropping index in elasticsearch java API

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With