how to load a Kafka topic to HDFS?

Question

I am using hortonworks sandbox.
creating topic:

./kafka-topics.sh --create --zookeeper 10.25.3.207:2181 --replication-factor 1 --partitions 1 --topic lognew

tailing the apache access log directory:

tail -f  /var/log/httpd/access_log |./kafka-console-producer.sh --broker-list 10.25.3.207:6667 --topic lognew

At another terminal (of kafka bin) start consumer:

./kafka-console-consumer.sh --zookeeper 10.25.3.207:2181 --topic lognew --from-beginning

The apache access logs are sent to the kafka topic "lognew".

I need to store them to HDFS.
Any ideas or suggestions regarding how to do this.

Thanks in advance.
Deepthy

Anatoly Deyneka · Accepted Answer

we use camus.

Camus is a simple MapReduce job developed by LinkedIn to load data from Kafka into HDFS. It is capable of incrementally copying data from Kafka into HDFS such that every run of the MapReduce job picks up where the previous run left off. At LinkedIn, Camus is used to load billions of messages per day from Kafka into HDFS.

But it looks like it's replaced with gobblin

Gobblin is a universal data ingestion framework for extracting, transforming, and loading large volume of data from a variety of data sources, e.g., databases, rest APIs, FTP/SFTP servers, filers, etc., onto Hadoop. Gobblin handles the common routine tasks required for all data ingestion ETLs, including job/task scheduling, task partitioning, error handling, state management, data quality checking, data publishing, etc. Gobblin ingests data from different data sources in the same execution framework, and manages metadata of different sources all in one place. This, combined with other features such as auto scalability, fault tolerance, data quality assurance, extensibility, and the ability of handling data model evolution, makes Gobblin an easy-to-use, self-serving, and efficient data ingestion framework.

how to load a Kafka topic to HDFS?

Tags:

apache-kafka

hadoop

Deepthy

1 Answers

Anatoly Deyneka

Recent Activity

Donate For Us

how to load a Kafka topic to HDFS?

Tags:

apache-kafka

hadoop

Deepthy

1 Answers

Anatoly Deyneka

Related questions

Recent Activity

Donate For Us