Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch / Kibana: Application-side joins

is it possible with Kibana (preferably the shining new version 4 beta) to perform application-side joins?

I know that ES / Kibana is not built to replace relational- databases and it is normally a better idea to denormalize my data. In this use-case however, this is not the best approach since index-size is exploding and performance is dropping:

I'm indexing billions of documents containing session information of network flows like this: source ip, source port, destination ip, destination port, timestamp.

Now I also want to collect additional information for each ip address, such as geolocation, asn, reverse dns etc. Adding this information to every single session document makes the whole database unmanageable: There are millions of documents with the same ip addresses and the redundancy of adding the same additional information to all these documents leads to a massive bloat and an unresponsive user-experience even on a cluster with hundreds of gigabytes of ram.

Instead I would like to create a separate index containing only unique ip addresses and the metadata that I have collected to each one of them.

The question is: How can I still analyze my data using kibana? For each document returned by the query, kibana should perform a lookup in the ip-index and "virtually enrich" each ip address with this information. Something like adding virtual fields so the structure would look like this (on the fly):

source ip, source port, source country, source asn, source fqdn

I'm aware that this would come at the cost of multiple queries.

like image 367
user167172 Avatar asked Nov 12 '14 07:11

user167172


People also ask

What is application side join?

The example in the application-side join, means that you are actually making two queries (two different requests to elastic) on the application side. First query you get the list of ids you need to filter on. Second query you pass the list of ids that you got to the terms filter.

Can you do joins in Elasticsearch?

Joining queriesedit Instead, Elasticsearch offers two forms of join which are designed to scale horizontally. Documents may contain fields of type nested . These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.

Can we join two indexes in Elasticsearch?

One challenge is that Elasticsearch does not support joins between indexes. This means that we need to find another way to combine the data from the two indexes. Another challenge is that the data in each index may be structured differently. This can make it difficult to combine the data from the two indexes.

How do I merge two queries in Elasticsearch?

You can combine the queries using bool query. Based on your requirement you can use 'should' or 'must' inside the bool clauses. You may want to schearch for both the field you want, and then aggregate by the most important field.


1 Answers

I don't think that there is such thing, but maybe that you could play around with the filters :

  1. You create nice and simple data visualisations that filter on different types and display only one simple data.
  2. You put these different visualizations in a dashboard in order to display all the data associated with a type of join.
  3. You use the filters as your join key and use the full dashboard, composed of different panels, to get insights of specific join keys (ips in your case, or sessions)

You need to create 1 dashboard for every type of join that you want to make.

Note that you will need to harmonize the names and mappings of the fields in your different documents!

Keep us updated, that's an interesting problematic, I would like to now how it turns out with so many documents.

like image 114
Valentin Cavel Avatar answered Sep 23 '22 22:09

Valentin Cavel