Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ElasticSearch:filtering documents based on field length


I read couple of similar problems on SO and suggest solution not work..
I want to find all fields where word is shorter than 8

my database screen:

database rows screen capture

I tried to do this using this query

{
  "query": {
    "match_all": {}
  },
  "filter": {
    "script": {
      "script": "doc['word'].length < 5"
    }
  }
}

what I doing wrong? I miss something?

like image 255
m1l05z Avatar asked Dec 29 '13 22:12

m1l05z


1 Answers

Any field used in a script is loaded entirely into memory (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields), so you may want to consider an alternative approach.

You can e.g. use the regexp-filter to just find terms of a certain length, with a pattern like .{0,4}.

Here's a runnable example you can play with: https://www.found.no/play/gist/2dcac474797b0b2b952a

#!/bin/bash

export ELASTICSEARCH_ENDPOINT="http://localhost:9200"

# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"word":"bar"}
{"index":{"_index":"play","_type":"type"}}
{"word":"barf"}
{"index":{"_index":"play","_type":"type"}}
{"word":"zip"}
'

# Do searches
# This will not match barf
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
    "query": {
        "filtered": {
            "filter": {
                "regexp": {
                    "word": {
                        "value": ".{0,3}"
                    }
                }
            }
        }
    }
}
'
like image 106
Alex Brasetvik Avatar answered Sep 27 '22 20:09

Alex Brasetvik