I want to use ES for a book search. So I decided to put the author name and title (as a nested document) into the index as follows:
curl -XPUT localhost:9200/library/search_books/1 -d'{ "author": "one", "books": [ { "title": "two", }, { "title": "three", } ] }'
What I don't get is: How do I need to structure the search query to find only book two when searching for "one two" and find nothing when searching for "two three" and all books when searching for "one"?
The nested type is a specialised version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.
must means: Clauses that must match for the document to be included. should means: If these clauses match, they increase the _score ; otherwise, they have no effect. They are simply used to refine the relevance score for each document.
Joining queriesedit Instead, Elasticsearch offers two forms of join which are designed to scale horizontally. Documents may contain fields of type nested . These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.
Perhaps something like this?
{ "query":{ "bool":{ "must":[ { "term":{ "author":"one" } }, { "nested":{ "path":"books", "query":{ "term":{ "books.title":"two" } } } } ] } } }
That query basically says that a document Must have author: one
and books.title: two
. You can reconfigure that query easily. For example, if you just want to search for authors, remove the nested part. If you want a different book, change the nested, etc etc.
This assumes you are using the actual Nested documents, and not inner objects. For inner objects you can just use fully qualified paths without the special nested query.
Edit1: You could perhaps accomplish this with clever boosting at index time, although it will only be an approximate solution. If "author" is boosted heavily, it will sort higher than matches to just the title, even if the title matches both parts of the query. You could then use a min_score cutoff to prevent those from displaying.
Its only a loose approximation, since some may creep through. It may also do strange things to the general sorting between "correct" matches.
Edit2: Updated using query_string to expose a "single input" option:
{ "query":{ "query_string" : { "query" : "+author:one +books.title:two" } } }
That's assuming you are using default "inner objects". If you have real Nested types, the query_string becomes much, much more complex:
{ "query":{ "query_string" : { "query" : "+author:one +BlockJoinQuery (filtered(books.title:two)->cache(_type:__books))" } } }
Huge Disclaimer I did not test either of these two query_strings, so they may not be exactly correct. But they show that the Lucene syntax is not overly friendly.
After thinking about it, your best solution may be indexing a special field that concatenates the author and the book title. Something like this:
{ "author": "one", "books": [ { "title": "two", }, { "title": "three", } ], "author_book": [ "one two", "one three" ] }
Then at search time, you can do exact Term matches on author_book
:
{ "query" : { "term" : { "author_book" : "one two" } } }
I found the answer in this post: Fun With Elasticsearch's Children and Nested Documents. A nested Document is the key. The mapping:
{ "book":{ "properties": { "tags": { "type": "multi_field", "fields": { "tags": { "type": "string", "store":"yes", "index": "analyzed" }, "facet": { "type": "string", "store":"yes", "index": "not_analyzed" } } }, "editions": { "type": "nested", "properties": { "title_author": { "type": "string", "store": "yes", "index": "analyzed" }, "title": { "type": "string", "store": "yes", "index": "analyzed" } } } } } }
The document:
"tags": ["novel", "crime"], "editions": [ { "title": "two", "title_author": "two one" }, { "title": "three", "title_author": "three one" } ]
Now I can search like:
{ "query": { "bool": { "should": [ { "nested": { "path": "editions", "query": { "match": { "editions.title_author": { "query": "one two", "operator": "and" } } } } } ] } } }
And if searched for "two three" I would not get a match. I would get one with "one two" or "one three". In version 1.1.0 there will be another option with a multi_match query and the option cross_fields which would allow not to repeat the title and only add the author name to each nested document. That would keep the index smaller.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With