Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sub-queries with "union" in elasticsearch

I'm currently busy working on a project in which we chose to use Elasticsearch as the search engine for a classifieds website.

Currently, I have the following business rule:

List 25 adverts per page. Of these 25, 10 of the displayed adverts must be "Paid Adverts", and the other 15 must be "Free". All 25 must be relevant to the search performed (i.e. Keywords, Region, Price, Category, etc.)

I know I can do this using two seperate queries, but this seems like an immense waste of resources. Is it possible to do "sub-queries" (if you can call them that?) and union these results into a single result set? Somehow only fetching 10 "Paid" adverts and 15 "Free" ones from elasticsearch, in one single query? Assuming of course that there are enough adverts to make this requirement possible.

Thanks for any help!

edit - Just adding my mapping information for more clarity.

"properties": {
       "advertText": {
          "type": "string",
          "boost": 2,
          "store": true,
          "analyzer": "snowball"
       },
       "canonical": {
          "type": "string",
          "store": true
       },
       "category": {
          "properties": {
             "id": {
                "type": "string",
                "store": true
             },
             "name": {
                "type": "string",
                "store": true
             },
             "parentCategory": {
                "type": "string",
                "store": true
             }
          }
       },
       "contactNumber": {
          "type": "string",
          "index": "not_analyzed",
          "store": true
       },
       "emailAddress": {
          "type": "string",
          "store": true,
          "analyzer": "url_email_analyzer"
       },
       "advertType": {
          "type": "string",
          "index": "not_analyzed"
       },
       ...
}

What I want then is to be able to query this and get 10 results where "advertType": "Paid" and 15 where "advertType": "Free"...

like image 721
iLikeBreakfast Avatar asked Jun 25 '14 12:06

iLikeBreakfast


People also ask

Can we do joins in Elasticsearch?

Instead, Elasticsearch offers two forms of join which are designed to scale horizontally. Documents may contain fields of type nested . These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.

What is range query in Elasticsearch?

Range Queries in Elasticsearch Combining the greater than ( gt ) and less than ( lt ) range parameters is an effective way to search for documents that contain a certain field value within a range where you know the upper and lower bounds.

How does match query work in Elasticsearch?

The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .

What is the difference between query and filter in Elasticsearch?

Queries are slower it returns a calculated score of how well a document matches the query. Filters are faster because they check only if the document matched or not. Queries produce non-boolean values. Filters produce boolean values.


1 Answers

A couple of approaches you can take.

First, you can try using the multi-search API:

Multi Search API

The multi search API allows to execute several search requests within the same API. The endpoint for it is _msearch.

The format of the request is similar to the bulk API format

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.html

A basic example:

curl -XGET 'http://127.0.0.1:9200/advertising_index/_msearch?pretty=1'  -d '
{}
{"query" : {"match" : {"Paid_Ads" : "search terms"}}, "size" : 10}
{}
{"query" : {"match" : {"Free" : "search terms"}}, "size" : 15}
'

I've made up the fields and query but overall you should get the idea - you hit the _msearch endpoint and pass it a series of queries starting with empty brackets {}. For Paid I've set size to 10 and for Free I've set size to 15.

Subject to the details of your own implementation you should be able to use something like this.

If that does not work for whatever reason you can also try using a limit filter:

Limit Filter

A limit filter limits the number of documents (per shard) to execute on. For example:

{
    "filtered" : {
        "filter" : {
             "limit" : {"value" : 100}
         },
         "query" : {
            "term" : { "name.first" : "shay" }
        }
    }
}

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html

Note that the limits are per shard, not per index. Given a default of 5 primary shards per index, to get a total response of 10 you would set limit to 2 (2X5 == 10). Also note that this can produce incomplete results if you have multiple matches on one shard but none on another.

You would then combine two filters with a bool filter:

Bool Filter

A filter that matches documents matching boolean combinations of other queries. Similar in concept to Boolean query, except that the clauses are other filters. Can be placed within queries that accept a filter.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html

I've not fleshed this one out in any detail as it will require more information about your specific indexes, mappings, data and queries.

like image 54
John Petrone Avatar answered Sep 22 '22 22:09

John Petrone