I am hosting Elasticsearch cluster in EKS and I'd like to stream all cloudwatch groups to this Elasticsearch cluster via Kinesis Firehose. But AWS Kinesis firehose doesn't support stream data to Elasticsearch cluster other than AWS hosted ES.
What is the best way to stream data to self hosted ES cluster?
Amazon Kinesis Data Firehose can now deliver streaming data to an Amazon Elasticsearch Service domain in an Amazon VPC.
Amazon Kinesis is fully managed and runs your streaming applications without requiring you to manage any infrastructure.
Data Streams is a low latency streaming service in AWS Kinesis with the facility for ingesting at scale. On the other hand, Kinesis Firehose aims to serve as a data transfer service. The primary purpose of Kinesis Firehose focuses on loading streaming data to Amazon S3, Splunk, ElasticSearch, and RedShift.
Kinesis Firehose will then collect the data into batches and send the batches to an Elasticsearch service cluster We will use Kibana to visualize the streaming data stored in the Elasticsearch cluster Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data stores and analytics tools.
If you then use that data stream as a source for your Kinesis Data Firehose delivery stream, Kinesis Data Firehose de-aggregates the records before it delivers them to the destination. If you configure your delivery stream to transform the data, Kinesis Data Firehose de-aggregates the records before it delivers them to AWS Lambda.
Kinesis Data Firehose delivery stream is the underlying component for operations of Kinesis Firehose. The delivery stream helps in automatically delivering data to the specified destination, such as Splunk, S3, or RedShift. Users have the option of configuring AWS Kinesis Firehose for transforming data before its delivery.
A mazon Kinesis is a service provided by AWS for processing data in real-time. In this scenario that we are going to talk about data that are coming to Kinesis as a stream where Kinesis will execute all kinds of functionalities based on our requirements. There are many other data stream processing solutions in the community as well.
I think the best way is by means of a lambda function for Firehose. For this to work, you would have to choose supported destination, e.g. S3. The function normally is used to transform the records, but you can program what ever logic you want, including uploading records to a custom ES.
If you would use Python, the function could use elasticsearch layer to connect with your custom cluster and inject records into it. elasticsearch is python interface to ES and it will work with any ES cluster.
An alternative is to use HTTP Endpoint for Your Destination. In this scenario, you could have maybe small instance on ec2 container which would get the records from firehose, and then push them to ES. Just like before, elasticsearch library could be used with Python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With