Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying across multiple elasticsearch types

I want to fetch documents present in multiple types (type1 AND type2 AND type3...) in Elastic Search 5.0 . I know searching across multiple types is possible by using multiple types like type1,type2 in URL and by also filtering the _type field. But all these conditions are OR (type1 OR type2). How do I achieve the AND condition?

Here are two documents in my ES,

{
   "_index":"cust_58e8700034fa4e368590fb1396e2641c",
   "_type":"unique-fp-domains",
   "_id":"n_d4dbba7309a94503b25eca735078f17c_258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
   "_version":2,
   "_score":1,
   "_source":{
      "mg_timestamp":1579866709096,
      "violated-directive":"connect-src",
      "fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
      "time":1579866709096,
      "scan-id":"n_d4dbba7309a94503b25eca735078f17c",
      "blocked-uri":"play.sundaysky.com"
   }
}


{
   "_index":"cust_58e8700034fa4e368590fb1396e2641c",
   "_type":"tag-alexa-top1k-using-csp-tld-domain",
   "_id":"AW_XY4P4kmprPQ28bTUb",
   "_version":1,
   "_score":1,
   "_source":{
      "tagged-domain":"sundaysky.com",
      "tag-guidance":"FP",
      "additional-tag-metadata-isbase64-encoded":"eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
      "project-id":2,
      "fp-hash":"258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
      "scan-id":"n_d4dbba7309a94503b25eca735078f17c",
   }
}

I want to fetch the documents from the same index from the given 2 types with "scan-id":"n_d4dbba7309a94503b25eca735078f17c"

I tried this,

{
  "size": 100,
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "_type": {
                    "value": "tag-alexa-top1k-using-csp-tld-domain"
                  }
                }
              },
              {
                "term": {
                  "scan-id": {
                    "value": "n_d4dbba7309a94503b25eca735078f17c"
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "filter": [
              {
                "term": {
                  "_type": {
                    "value": "unique-fp-domains"
                  }
                }
              },
              {
                "term": {
                  "scan-id": {
                    "value": "n_d4dbba7309a94503b25eca735078f17c"
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

But it doesn't work.

like image 791
z3r0 Avatar asked May 20 '26 11:05

z3r0


2 Answers

Elasticsearch is not good in joining different collections of documents, but in your case you might be able to solve your issue with parent-child relationship.

How to query many index types together in an AND fashion?

In case when you have a one-to-many relationship you can model it with parent-child. Let's suppose that type unique-fp-domains is "parent" type and scan-id field is a unique identifier. Let's also suppose that tag-alexa-top1k-using-csp-tld-domain is a "child" and every document of type tag-alexa-top1k-using-csp-tld-domain refers to exactly 1 document in unique-fp-domains.

Then we should create the Elasticsearch mapping in the following way:

PUT /cust_58
{
  "mappings": {
    "unique-fp-domains": {},
    "tag-alexa-top1k-using-csp-tld-domain": {
      "_parent": {
        "type": "unique-fp-domains" 
      }
    }
  }
}

And insert the documents like this:

# "parent"
PUT /cust_58/unique-fp-domains/n_d4dbba7309a94503b25eca735078f17c
{
    "mg_timestamp": 1579866709096,
    "violated-directive": "connect-src",
    "fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
    "time": 1579866709096,
    "scan-id": "n_d4dbba7309a94503b25eca735078f17c",
    "blocked-uri": "play.sundaysky.com"
}

# "child"
POST /cust_58/tag-alexa-top1k-using-csp-tld-domain?parent=n_d4dbba7309a94503b25eca735078f17c
{
    "tagged-domain": "sundaysky.com",
    "tag-guidance": "FP",
    "additional-tag-metadata-isbase64-encoded": "eyJ0b3RhbC1hbGV4YS1tYXRjaGVzIjoyMzh9",
    "project-id": 2,
    "fp-hash": "258b3ad1a11aba282f35908662bdc5432d68fd96bf3ca90013dcdd5764331399",
    "scan-id": "n_d4dbba7309a94503b25eca735078f17c"
}

Now we will be able to query for parent objects having any child associated with it == join on parent ID, which is we forced to be scan-id by providing the _id of the document manually.

The query will use has_child and will look like this:

POST /cust_58/unique-fp-domains/_search
{
    "query": {
        "has_child": {
            "type": "tag-alexa-top1k-using-csp-tld-domain",
            "query": {
                "match_all": {}
            },
            "inner_hits": {}
        }
    }
}

Note that we use inner_hits to tell Elasticsearch to retrieve the matched "child" documents.

The output would look like:

    "hits": [
      {
        "_index": "cust_58",
        "_type": "unique-fp-domains",
        "_id": "n_d4dbba7309a94503b25eca735078f17c",
        "_score": 1.0,
        "_source": {
          "mg_timestamp": 1579866709096,
          "violated-directive": "connect-src",
...
        },
        "inner_hits": {
          "tag-alexa-top1k-using-csp-tld-domain": {
            "hits": {
              "total": 1,
              "max_score": 1.0,
              "hits": [
                {
                  "_type": "tag-alexa-top1k-using-csp-tld-domain",
                  "_id": "AW_xhfnnIzWDkoWd1czA",
                  "_score": 1.0,
                  "_routing": "n_d4dbba7309a94503b25eca735078f17c",
                  "_parent": "n_d4dbba7309a94503b25eca735078f17c",
                  "_source": {
                    "tagged-domain": "sundaysky.com",
...
                  }

What are the downsides of using parent-child?

  • the parent ID should be unique
  • join is only on parent ID
  • some performance overhead:

    If you care about query performance you should not use this query.

  • to enable parent-child one will have to change the mappings and reindex the existing data

Other important things to consider

In Elasticsearch 6, types have been removed. The good news are that already starting from Elasticsearch 5 one can use join datatype.

In general, Elasticsearch is not very good to manage relations between objects, but there are few ways to deal with them.

Hope that helps!

like image 59
Nikolay Vasiliev Avatar answered May 24 '26 11:05

Nikolay Vasiliev


I think this query will figure out your problem;

"query": {
  "bool": {
    "must": [
      {
        "terms": {
          "_type": "tag-alexa-top1k-using-csp-tld-domain"
        }
      },
      {
        "terms": {
          "_type": "unique-fp-domains"
        }
      }
    ],
    "filter": [
      {
        "scan-id": {
          "_type": "n_d4dbba7309a94503b25eca735078f17c"
        }
      }
    ]
  }
}
like image 33
Mesut Aslan Avatar answered May 24 '26 12:05

Mesut Aslan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!