Say I have a million (many) documents in my index. I execute a search query sorting the items by some key X.
Now I have a very long list of results: [..., id1, id2, id3, ...]
Question: how do I get id1
and id3
if I know id2
but don't want to execute the whole search/don't want to get all ids?
I'm looking of a generic solution that works for any search query. Given an id that for certain exists in the results of a query, how to get prev/next by that id. The query should NOT have prior knowledge of anything else than the id whose prev/next are searched for. (In other words, if ordered by title and searched for prev/next of id X, the title of X is not known at query time, only X's id.)
It is of course possible to execute multiple search queries and achieve the same end result by getting id2
and then playing with ordering to get ids 1 and 3.
EDIT: I think Luc E's answer isn't what I'm looking for. In that scenario, knowledge of the original objects title is required to query for prev/next. I'm looking for a solution where only the id is known at query time.
Example data looks like this:
[...
{id: 32, title: 'AAA'},
{id: 12, title: 'BBB'},
{id: 99, title: 'CCC'},
{id: 3, title: 'DDD'},
{id: 1001, title: 'EEE'},
...]
What I know: id 99. What I don't know: what is title of id 99. What I want: ids of the prev/next items sorted by title field (=3 and 12).
To put it yet another way: I have id 99 but not the title in my hand. I want a query that gives me ids 3 and 12 (they are prev/next sorted by title).
By default, you cannot use from and size to page through more than 10,000 hits. This limit is a safeguard set by the index. max_result_window index setting. If you need to page through more than 10,000 hits, use the search_after parameter instead.
There are two recommended methods to retrieve selected fields from a search query: Use the fields option to extract the values of fields present in the index mapping. Use the _source option if you need to access the original data that was passed at index time.
You can use the search API to search and aggregate data stored in Elasticsearch data streams or indices. The API's query request body parameter accepts queries written in Query DSL. The following request searches my-index-000001 using a match query. This query matches documents with a user.id value of kimchy .
The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.
What you want to do is called deep scrolling
, you have only two ways to make it :
The easiest way is the search_after but you will need to make two requests :
id3
id1
So, in this example I am looking for id2 : 128
. I can sort documents with the field title
and I have get beforehand the value of title
for id2
which is title_of_128
.
To perform the search_after
, I have to add the _id
on a sub sort condition
Here is my query :
POST test/_search
{
"size": 2,
"search_after": ["title_of_128","128"],
"sort": [
{
"title": {
"order": "asc"
},
"_id": {
"order": "asc"
}
}
]
}
The result of this query is id2
and id3
Now I inverse the direction of the sort in order to retrieve the id1
:
POST test/_search
{
"size": 2,
"search_after": ["title_of_128","128"],
"sort": [
{
"title": {
"order": "desc"
},
"_id": {
"order": "desc"
}
}
]
}
The result of this query is id2
and id1
Note that sort with _id
is deprecated and the best practice is to copy the _id
in another field if you want to use search_after
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With