I have a link like http://drive.google.com and I want to match "google" out of the link. I have: <pre class="prettyprint"><code>query: { bool : { must: { match: { text: 'google'} } } } </code></pre> But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

The point is that the ElasticSearch regex you are using requires a full string match: <blockquote> Lucene’s patterns are always anchored. The pattern provided must match the entire string. </blockquote> Thus, to match any character (but a newline), you can use <code>.*</code> pattern: <pre class="prettyprint"><code>match: { text: '.*google.*'} ^^ ^^ </code></pre> In ES6+, use <code>regexp</code> insted of <code>match</code>: <pre class="prettyprint lang-none prettyprint-override"><code>"query": { "regexp": { "text": ".*google.*"} } </code></pre> One more variation is for cases when your string can have newlines: <code>match: { text: '(.|\n)*google(.|\n)*'}</code>. This awful <code>(.|\n)*</code> is a must in ElasticSearch because this regex flavor does not allow any <code>[\s\S]</code> workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators." However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search: <pre class="prettyprint"><code>{ "query": { "wildcard": { "text": { "value": "*google*", "boost": 1.0, "rewrite": "constant_score" } } } } </code></pre> See Wildcard search for more details. NOTE: The wildcard pattern also needs to match the whole input string, thus <ul> <li> <code>google*</code> finds all strings starting with <code>google</code> </li> <li> <code>*google*</code> finds all strings containing <code>google</code> </li> <li> <code>*google</code> finds all strings ending with <code>google</code> </li> </ul> Also, bear in mind the only pair of special characters in wildcard patterns: <pre class="prettyprint"><code>?, which matches any single character *, which can match zero or more characters, including an empty one </code></pre>

How do I do a partial match in Elasticsearch?

Tags:

json

regex

url

parsing

elasticsearch

I have a link like http://drive.google.com and I want to match "google" out of the link.

I have:

query: {     bool : {         must: {             match: { text: 'google'}          }     } }

But this only matches if the whole text is 'google' (case insensitive, so it also matches Google or GooGlE etc). How do I match for the 'google' inside of another string?

435

asked Jun 08 '16 17:06

ThePumpkinMaster

1 Answers

The point is that the ElasticSearch regex you are using requires a full string match:

Lucene’s patterns are always anchored. The pattern provided must match the entire string.

Thus, to match any character (but a newline), you can use .* pattern:

match: { text: '.*google.*'}                 ^^      ^^

In ES6+, use regexp insted of match:

"query": {    "regexp": { "text": ".*google.*"}  }

One more variation is for cases when your string can have newlines: match: { text: '(.|\n)*google(.|\n)*'}. This awful (.|\n)* is a must in ElasticSearch because this regex flavor does not allow any [\s\S] workarounds, nor any DOTALL/Singleline flags. "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

However, if you do not plan to match any complicated patterns and need no word boundary checking, regex search for a mere substring is better performed with a mere wildcard search:

{     "query": {         "wildcard": {             "text": {                 "value": "*google*",                 "boost": 1.0,                 "rewrite": "constant_score"             }         }     } }

See Wildcard search for more details.

NOTE: The wildcard pattern also needs to match the whole input string, thus

google* finds all strings starting with google
*google* finds all strings containing google
*google finds all strings ending with google

Also, bear in mind the only pair of special characters in wildcard patterns:

?, which matches any single character *, which can match zero or more characters, including an empty one

answered Sep 19 '22 08:09

Wiktor Stribiżew

Related questions
                            
                                Check if field exists in json type column PostgreSQL
                            
                                Argument Exception when creating JObject
                            
                                How to add items to a unordered list <ul> using jquery
                            
                                How to map JSON field names to different object field names?
                            
                                Cannot deserialize the current JSON array (e.g. [1,2,3]) into type
                            
                                Iterating through JSON array in Shell script
                            
                                Returning raw json (string) in wcf
                            
                                How do I PUT data to Rails using JQuery
                            
                                _corrupt_record error when reading a JSON file into Spark
                            
                                Convert object to JSON string in C# [duplicate]
                            
                                Returning a string containing valid Json with Nancy
                            
                                store return json value in input hidden field
                            
                                Convert InputStream to JSONObject
                            
                                Jackson - serialization of entities with birectional relationships (avoiding cycles)
                            
                                How does a Tuple serialize to and deserialize from JSON?
                            
                                jq - How do I print a parent value of an object when I am already deep into the object's children?
                            
                                Send JSON data from Javascript to PHP?
                            
                                asp.net asmx web service returning xml instead of json
                            
                                Updating a JSON object using Javascript
                            
                                Rest-assured. Is it possible to extract value from request json?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With