Kafka to Elasticsearch, HDFS with Logstash or Kafka Streams/Connect

Question

I use Kafka for message queue/processing. My question is about performance/best practice. I will do my own performance tests but maybe someone has results/experience already.

The data is raw in a Kafka (0.10) topic and I want to transfer it structured to ES and HDFS.

Now I see 2 possibilities:

Logstash (Kafka input plugin, grok filter (parsing), ES/webhdfs output plugin)
Kafka Streams (parsing), Kafka Connect (ES sink, HDFS sink)

Without any tests I would say that the second option is better/cleaner and more reliable?

OneCricketeer · Accepted Answer

Logstash "best practice" for getting data into Elasticsearch. WebHDFS won't have the raw performance of the Java API that is part of the Kafka Connect plugin, however.

Grok could be done in a Kafka Streams process, so your parsing could be done in either location.

If you are on an Elastic subscription, then they would like to sell Logstash. Confluent would like to sell Kafka Streams + Kafka Connect.

Avro seems to be the best medium for data transfer, and the Schema Registry is a popular way to do that. IIUC, Logstash doesn't work well with a Schema Registry or Avro, and prefers JSON.

In the Hadoop landscape, I would offer the intermediate options of Apache Nifi or Streamsets.

In the end, it really depends on your priorities, and how well you (and your team) can support these tools.

Kafka to Elasticsearch, HDFS with Logstash or Kafka Streams/Connect

Tags:

elasticsearch

apache-kafka

apache-kafka-streams

logstash

apache-kafka-connect

imehl

1 Answers

OneCricketeer

Recent Activity

Donate For Us

Kafka to Elasticsearch, HDFS with Logstash or Kafka Streams/Connect

Tags:

elasticsearch

apache-kafka

apache-kafka-streams

logstash

apache-kafka-connect

imehl

1 Answers

OneCricketeer

Related questions

Recent Activity

Donate For Us