Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elastic Search- Distinct elements from multiple fields

I created a mapping to index my mongoDb collection using elastic search. Here is the mapping properties:

"properties" : {
          "address_components" : {
            "properties" : {
              "_id" : {
                "type" : "string"
              },
              "subLocality1" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "subLocality2" : {
                "type" : "string",
                "index" : "not_analyzed"
              },
              "subLocality3" : {
                "type" : "string",
                "index" : "not_analyzed"
              }, 
             "city" : {
                "type" : "string",
                "index" : "not_analyzed"
              }
            }

Now, I want to retrieve overall unique items from these fields: subLocality1, subLocality2, subLocality3, city. Also, each of the distinct value should contain q as a sub-string. Distinct item should also contain corresponding city value.

Example:

"address_components" : {
    "subLocality1" : "s1"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"a"
  }

"address_components" : {
    "subLocality1" : "s3"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"a"
  }

"address_components" : {
    "subLocality1" : "s2"
    "subLocality2" : "s1",
    "subLocality3" : "s4",
    "city":"a"
  }

For above indexes, the expected result is:

"address_components" : {
    "subLocality1" : "s1"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"ct1"
  }

"address_components" : {
    "subLocality1" : "s3"
    "subLocality2" : "s1",
    "subLocality3" : "s2",
    "city":"ct1"
  }

"address_components" : {
    "subLocality1" : "s2"
    "subLocality2" : "s1",
    "subLocality3" : "s4",
    "city":"ct1"
  }
{s1, a}, {s2,a}, {s3,a}, {s4,a},{a,a}

I tried doing it using elastic search terms aggregation.

GET /rescu/rescu/_search?pretty=true&search_type=count

{
    "aggs" : {
        "distinct_locations" : {
            "terms" : {
                "script" : "doc['address_components.subLocality1'].value"
            }
        }
    }
}

But terms aggregations only applies for single field according to following link.

like image 655
Ritesh Kumar Gupta Avatar asked Jan 23 '15 17:01

Ritesh Kumar Gupta


1 Answers

I found the answer myself, after going through elastic search api docs. We need to use a script to retrieve terms from multiple fields.

GET /rescu/rescu/_search?pretty=true&search_type=count

{
  "aggs": {
    "distinct_locations": {
      "terms": {
        "script": "[doc['address_components.subLocality1'].value,doc['address_components.subLocality2'].value,doc['address_components.subLocality3'].value]",
        "size": 5000
      }
    }
  }
}
like image 99
Ritesh Kumar Gupta Avatar answered Oct 04 '22 20:10

Ritesh Kumar Gupta