Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join query in ElasticSearch

Is there any way (query) to join 2 JSONs below in ElasticSearch

{
product_id: "1111",
price: "23.56",
stock: "100"
}

{
product_id: "1111",
category: "iPhone case",
manufacturer: "Belkin"
}

Above 2 JSONs processed (input) under 2 different types in Logstash, so their indexes are available in different 'type' filed in Elasticsearch.

What I want is to join 2 JSONs on product_id field.

like image 670
Fawad Avatar asked Mar 24 '14 13:03

Fawad


1 Answers

It depends what you intend when you say JOIN. Elasticsearch is not like regular database that supports JOIN between tables. It is a text search engine that manages documents within indexes.

On the other hand you can search within the same index over multiple types using a fields that are common to every type.

For example taking your data I can create an index with 2 types and their data like follows:

curl -XPOST localhost:9200/product -d '{
    "settings" : {
        "number_of_shards" : 5
    }
}'

curl -XPOST localhost:9200/product/type1/_mapping -d '{
        "type1" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "price" : { "type" : "integer" },
                "stock" : { "type" : "integer" }
            }
        }   
}'              

curl -XPOST localhost:9200/product/type2/_mapping -d '{
        "type2" : {
            "properties" : {
                "product_id" : { "type" : "string" },
                "category" : { "type" : "string" },
                "manufacturer" : { "type" : "string" }
            }
        }
}'  

curl -XPOST localhost:9200/product/type1/1 -d '{
        product_id: "1111", 
        price: "23",
        stock: "100"
}'

curl -XPOST localhost:9200/product/type2/1 -d '{
        product_id: "1111",
        category: "iPhone case",
        manufacturer: "Belkin"
}'

I effectively created one index called product with 2 type type1 and type2. Now I can do the following query and it will return both documents:

curl -XGET 'http://localhost:9200/product/_search?pretty=1' -d '{
    "query": {
        "query_string" : {
            "query" : "product_id:1111"
        }
    }
}'

{
  "took" : 95,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "product",
      "_type" : "type1",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    price: "23",
    stock: "100"
}
    }, {
      "_index" : "product",
      "_type" : "type2",
      "_id" : "1",
      "_score" : 0.5945348, "_source" : {
    product_id: "1111",
    category: "iPhone case",
    manufacturer: "Belkin"
}
    } ]
  }
}

The reason is because Elasticsearch will search over all documents within that index regardless of their type. This is still different than a JOIN in the sense Elasticsearch is not going to do a Cartesian product of the documents that belong to each type.

Hope that helps

like image 122
isaac.hazan Avatar answered Nov 08 '22 18:11

isaac.hazan