I want to use ES for a book search. So I decided to put the author name and title (as a nested document) into the index as follows: <pre class="prettyprint"><code>curl -XPUT localhost:9200/library/search_books/1 -d'{ "author": "one", "books": [ { "title": "two", }, { "title": "three", } ] }' </code></pre> What I don't get is: How do I need to structure the search query to find only book two when searching for "one two" and find nothing when searching for "two three" and all books when searching for "one"?

Perhaps something like this? <pre class="prettyprint"><code>{ "query":{ "bool":{ "must":[ { "term":{ "author":"one" } }, { "nested":{ "path":"books", "query":{ "term":{ "books.title":"two" } } } } ] } } } </code></pre> That query basically says that a document Must have <code>author: one</code> and <code>books.title: two</code>. You can reconfigure that query easily. For example, if you just want to search for authors, remove the nested part. If you want a different book, change the nested, etc etc. This assumes you are using the actual Nested documents, and not inner objects. For inner objects you can just use fully qualified paths without the special nested query. Edit1: You could perhaps accomplish this with clever boosting at index time, although it will only be an approximate solution. If "author" is boosted heavily, it will sort higher than matches to just the title, even if the title matches both parts of the query. You could then use a min_score cutoff to prevent those from displaying. Its only a loose approximation, since some may creep through. It may also do strange things to the general sorting between "correct" matches. Edit2: Updated using query_string to expose a "single input" option: <pre class="prettyprint"><code> { "query":{ "query_string" : { "query" : "+author:one +books.title:two" } } } </code></pre> That's assuming you are using default "inner objects". If you have real Nested types, the query_string becomes much, much more complex: <pre class="prettyprint"><code> { "query":{ "query_string" : { "query" : "+author:one +BlockJoinQuery (filtered(books.title:two)->cache(_type:__books))" } } } </code></pre> Huge Disclaimer I did not test either of these two query_strings, so they may not be exactly correct. But they show that the Lucene syntax is not overly friendly. <hr> <h3>Edit3 - This is my best idea:</h3> After thinking about it, your best solution may be indexing a special field that concatenates the author and the book title. Something like this: <pre class="prettyprint"><code>{ "author": "one", "books": [ { "title": "two", }, { "title": "three", } ], "author_book": [ "one two", "one three" ] } </code></pre> Then at search time, you can do exact Term matches on <code>author_book</code>: <pre class="prettyprint"><code>{ "query" : { "term" : { "author_book" : "one two" } } } </code></pre>

Combined non-Nested and Nested Query in Elasticsearch

Tags:

elasticsearch

I want to use ES for a book search. So I decided to put the author name and title (as a nested document) into the index as follows:

curl -XPUT localhost:9200/library/search_books/1 -d'{   "author": "one",   "books": [     {       "title": "two",     },     {       "title": "three",     }   ] }'

What I don't get is: How do I need to structure the search query to find only book two when searching for "one two" and find nothing when searching for "two three" and all books when searching for "one"?

563

asked Mar 22 '13 18:03

fisch

2 Answers

Perhaps something like this?

{   "query":{     "bool":{       "must":[         {           "term":{             "author":"one"           }         },         {           "nested":{             "path":"books",             "query":{               "term":{                 "books.title":"two"               }             }           }         }       ]     }   } }

That query basically says that a document Must have author: one and books.title: two. You can reconfigure that query easily. For example, if you just want to search for authors, remove the nested part. If you want a different book, change the nested, etc etc.

This assumes you are using the actual Nested documents, and not inner objects. For inner objects you can just use fully qualified paths without the special nested query.

Edit1: You could perhaps accomplish this with clever boosting at index time, although it will only be an approximate solution. If "author" is boosted heavily, it will sort higher than matches to just the title, even if the title matches both parts of the query. You could then use a min_score cutoff to prevent those from displaying.

Its only a loose approximation, since some may creep through. It may also do strange things to the general sorting between "correct" matches.

Edit2: Updated using query_string to expose a "single input" option:

 {   "query":{     "query_string" : {       "query" : "+author:one +books.title:two"     }   } }

That's assuming you are using default "inner objects". If you have real Nested types, the query_string becomes much, much more complex:

 {   "query":{     "query_string" : {       "query" : "+author:one +BlockJoinQuery (filtered(books.title:two)->cache(_type:__books))"     }   } }

Huge Disclaimer I did not test either of these two query_strings, so they may not be exactly correct. But they show that the Lucene syntax is not overly friendly.

Edit3 - This is my best idea:

After thinking about it, your best solution may be indexing a special field that concatenates the author and the book title. Something like this:

{   "author": "one",   "books": [     {       "title": "two",     },     {       "title": "three",     }   ],   "author_book": [ "one two", "one three" ] }

Then at search time, you can do exact Term matches on author_book:

{   "query" : {     "term" : {       "author_book" : "one two"     }   } }

answered Sep 28 '22 09:09

Zach

I found the answer in this post: Fun With Elasticsearch's Children and Nested Documents. A nested Document is the key. The mapping:

{   "book":{     "properties": {       "tags": { "type": "multi_field",         "fields": {             "tags": { "type": "string", "store":"yes", "index": "analyzed" },             "facet": { "type": "string", "store":"yes", "index": "not_analyzed" }         }       },       "editions": { "type": "nested",          "properties": {           "title_author": { "type": "string", "store": "yes", "index": "analyzed" },           "title": { "type": "string", "store": "yes", "index": "analyzed" }         }       }     }   } }

The document:

"tags": ["novel", "crime"],   "editions": [     {       "title": "two",       "title_author": "two one"     },     {       "title": "three",       "title_author": "three one"     }   ]

Now I can search like:

{    "query": {     "bool": {       "should": [         {           "nested": {             "path": "editions",             "query": {               "match": {                 "editions.title_author": {                   "query": "one two",                   "operator": "and"                 }               }             }           }         }       ]     }   } }

And if searched for "two three" I would not get a match. I would get one with "one two" or "one three". In version 1.1.0 there will be another option with a multi_match query and the option cross_fields which would allow not to repeat the title and only add the author name to each nested document. That would keep the index smaller.

answered Sep 28 '22 07:09

fisch

Related questions
                            
                                Courier Fetch: shards failed
                            
                                elasticsearch set sort order using querystring
                            
                                How to make query_string search exact phrase in ElasticSearch
                            
                                Creating DataFrame from ElasticSearch Results
                            
                                Elasticsearch: Job for elasticsearch.service failed
                            
                                Elasticsearch OutOfMemoryError Java heap space
                            
                                ElasticSearch updates are not immediate, how do you wait for ElasticSearch to finish updating it's index?
                            
                                Filter out metadata fields and only return source fields in elasticsearch
                            
                                running Elastic Search as a Windows service
                            
                                ElasticSearch vs SQL Full Text Search [closed]
                            
                                Elasticsearch relationship mappings (one to one and one to many)
                            
                                How do I create a stacked graph of HTTP codes in Kibana?
                            
                                is there any way to import a json file(contains 100 documents) in elasticsearch server.?
                            
                                Regarding elastic search memory usage
                            
                                Limit the number of results returned by Elastic Search
                            
                                What does "Limit of total fields [1000] in index [] has been exceeded" means in Elasticsearch
                            
                                What is an index in Elasticsearch
                            
                                Sync postgreSql data with ElasticSearch
                            
                                Defining analyzer while querying in elasticSearch
                            
                                Multilingual elasticsearch indexing best practice/experiences

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combined non-Nested and Nested Query in Elasticsearch

Tags:

elasticsearch

fisch

People also ask

2 Answers

Edit3 - This is my best idea:

Zach

fisch

Recent Activity

Donate For Us