Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find documents with empty string value on elasticsearch

I've been trying to filter with elasticsearch only those documents that contains an empty string in its body. So far I'm having no luck.

Before I go on, I should mention that I've already tried the many "solutions" spread around the Interwebz and StackOverflow.

So, below is the query that I'm trying to run, followed by its counterparts:

{     "query": {         "filtered":{             "filter": {                 "bool": {                     "must_not": [                         {                             "missing":{                                 "field":"_textContent"                             }                         }                     ]                 }             }         }     } } 

I've also tried the following:

 {     "query": {         "filtered":{             "filter": {                 "bool": {                     "must_not": [                         {                             "missing":{                                 "field":"_textContent",                                 "existence":true,                                 "null_value":true                             }                         }                     ]                 }             }         }     } } 

And the following:

   {     "query": {         "filtered":{             "filter": {                     "missing": {"field": "_textContent"}             }         }     } } 

None of the above worked. I get an empty result set when I know for sure that there are records that contains an empty string field.

If anyone can provide me with any help at all, I'll be very grateful.

Thanks!

like image 386
Paulo Victor Avatar asked Aug 29 '14 05:08

Paulo Victor


People also ask

How do I search for null values in Elasticsearch?

A null value cannot be indexed or searched. When a field is set to null , (or an empty array or an array of null values) it is treated as though that field has no values. Replace explicit null values with the term NULL . An empty array does not contain an explicit null , and so won't be replaced with the null_value .

How do I view documents in Elasticsearch?

You can view the document in two ways. The Table view displays the document fields row-by-row. The JSON (JavaScript Object Notation) view allows you to look at how Elasticsearch returns the document. The link is valid for the time the document is available in Elasticsearch.

How do you search data in Elasticsearch index?

You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .


2 Answers

If you are using the default analyzer (standard) there is nothing for it to analyze if it is an empty string. So you need to index the field verbatim (not analyzed). Here is an example:

Add a mapping that will index the field untokenized, if you need a tokenized copy of the field indexed as well you can use a Multi Field type.

PUT http://localhost:9200/test/_mapping/demo {   "demo": {     "properties": {       "_content": {         "type": "string",         "index": "not_analyzed"       }     }   } } 

Next, index a couple of documents.

/POST http://localhost:9200/test/demo/1/ {   "_content": "" }  /POST http://localhost:9200/test/demo/2 {   "_content": "some content" } 

Execute a search:

POST http://localhost:9200/test/demo/_search {   "query": {     "filtered": {       "filter": {         "term": {           "_content": ""         }       }     }   } } 

Returns the document with the empty string.

{     took: 2,     timed_out: false,     _shards: {         total: 5,         successful: 5,         failed: 0     },     hits: {         total: 1,         max_score: 0.30685282,         hits: [             {                 _index: test,                 _type: demo,                 _id: 1,                 _score: 0.30685282,                 _source: {                     _content: ""                 }             }         ]     } } 
like image 159
Dan Tuffery Avatar answered Sep 28 '22 23:09

Dan Tuffery


Found solution here https://github.com/elastic/elasticsearch/issues/7515 It works without reindex.

PUT t/t/1 {   "textContent": "" }  PUT t/t/2 {   "textContent": "foo" }  GET t/t/_search {   "query": {     "bool": {       "must": [         {           "exists": {             "field": "textContent"           }         }       ],       "must_not": [         {           "wildcard": {             "textContent": "*"           }         }       ]     }   } } 
like image 42
Serhiy Demydenko Avatar answered Sep 28 '22 21:09

Serhiy Demydenko