I am storing Book Titles in elasticsearch and they all belong to many shops. Like this:
{
"books": [
{
"id": 1,
"title": "Title 1",
"store": "store1"
},
{
"id": 2,
"title": "Title 1",
"store": "store2"
},
{
"id": 3,
"title": "Title 1",
"store": "store3"
},
{
"id": 4,
"title": "Title 2",
"store": "store2"
},
{
"id": 5,
"title": "Title 2",
"store": "store3"
}
]
}
How can I get all the books and group them by title... and one result per group (one row with group with the same title so i can get all ids and stores)?
Based on data above I want to get two results with all ids and stores in them.
Expected results:
{
"hits":{
"total" : 2,
"hits" : [
{
"0" : {
"title" : "Title 1",
"group": [
{
"id": 1,
"store": "store1"
},
{
"id": 2,
"store": "store2"
},
{
"id": 3,
"store": "store3"
},
]
}
},
{
"1" : {
"title" : "Title 2",
"group": [
{
"id": 4,
"store": "store2"
},
{
"id": 5,
"store": "store3"
}
]
}
}
]
}
}
What you are looking for is not possible in Elasticsearch, at least not with the current version (1.1).
There is a long outstanding issue for this feature with a lot of +1's and demand behind it.
As for statements: Simon says, it requires a lot of refactoring and although it is planned, there is no way of saying, when it will be implemented or even shipped.
A similar statement was made by Clinton Gormley in his webinar, that field grouping needs a lot of effort to be done right, especially since Elasticsearch is a sharded and distributed environment by nature. It would be not that big of a deal, if you'd ignore sharding, but Elasticsearch wants to ship only with features, that can scale with the complete system and work as well on hundreds of machines as they would on a single box.
If you're not tied to Elasticsearch, Solr offers such a feature.
Otherwise, probably the best solution at the moment is to do this client side. That is, query for some documents, do the grouping on you client and if needed, fetch some more results to satisfy your desired group size (as far as i know, this is what Solr is doing under the hood).
Not exactly what you wanted, but you could also go for aggregations; create one bucket for your title
and have a sub-aggregation done on the id
field. You won't get the store
values with this, but you could retrieve them from your datastore once you have the ids.
{
"aggs" : {
"titles" : {
"terms" : { "field" : "title" },
"aggs": {
"ids": {
"terms": { "field" : "id" }
}
}
}
}
}
Edit: It seems, that with the top_hits aggregations, result grouping could be implemented soon.
You can implement above desired result using Aggregation in aggregation with top_hits aggs. ex.
aggs: {
"set": {
"terms": {
field: "id"
},
"aggs": {
"color": {
"terms": {
field: "color"
},
"aggs": {
"products": {
"top_hits": {
_source:{
"include":["size"]
}
}
}
}
},
"product": {
"top_hits": {
_source:{
"include":["productDetails"]
},
size: 1
}
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With