Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does kinesis firehose stream data to self managed elasticsearch?

I am hosting Elasticsearch cluster in EKS and I'd like to stream all cloudwatch groups to this Elasticsearch cluster via Kinesis Firehose. But AWS Kinesis firehose doesn't support stream data to Elasticsearch cluster other than AWS hosted ES.

What is the best way to stream data to self hosted ES cluster?

like image 375
Joey Yi Zhao Avatar asked Mar 01 '21 11:03

Joey Yi Zhao


People also ask

Can firehose deliver to Elasticsearch?

Amazon Kinesis Data Firehose can now deliver streaming data to an Amazon Elasticsearch Service domain in an Amazon VPC.

Is Kinesis data streams fully managed?

Amazon Kinesis is fully managed and runs your streaming applications without requiring you to manage any infrastructure.

What is the difference between AWS Kinesis streams and Firehose?

Data Streams is a low latency streaming service in AWS Kinesis with the facility for ingesting at scale. On the other hand, Kinesis Firehose aims to serve as a data transfer service. The primary purpose of Kinesis Firehose focuses on loading streaming data to Amazon S3, Splunk, ElasticSearch, and RedShift.

How do I use kinesis Firehose with Elasticsearch?

Kinesis Firehose will then collect the data into batches and send the batches to an Elasticsearch service cluster We will use Kibana to visualize the streaming data stored in the Elasticsearch cluster Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data stores and analytics tools.

How does Kinesis data Firehose work with AWS Lambda?

If you then use that data stream as a source for your Kinesis Data Firehose delivery stream, Kinesis Data Firehose de-aggregates the records before it delivers them to the destination. If you configure your delivery stream to transform the data, Kinesis Data Firehose de-aggregates the records before it delivers them to AWS Lambda.

What is Firehose delivery stream in kinesis?

Kinesis Data Firehose delivery stream is the underlying component for operations of Kinesis Firehose. The delivery stream helps in automatically delivering data to the specified destination, such as Splunk, S3, or RedShift. Users have the option of configuring AWS Kinesis Firehose for transforming data before its delivery.

What is Amazon Kinesis and how does it work?

A mazon Kinesis is a service provided by AWS for processing data in real-time. In this scenario that we are going to talk about data that are coming to Kinesis as a stream where Kinesis will execute all kinds of functionalities based on our requirements. There are many other data stream processing solutions in the community as well.


Video Answer


1 Answers

I think the best way is by means of a lambda function for Firehose. For this to work, you would have to choose supported destination, e.g. S3. The function normally is used to transform the records, but you can program what ever logic you want, including uploading records to a custom ES.

If you would use Python, the function could use elasticsearch layer to connect with your custom cluster and inject records into it. elasticsearch is python interface to ES and it will work with any ES cluster.

An alternative is to use HTTP Endpoint for Your Destination. In this scenario, you could have maybe small instance on ec2 container which would get the records from firehose, and then push them to ES. Just like before, elasticsearch library could be used with Python.

like image 172
Marcin Avatar answered Nov 15 '22 01:11

Marcin