Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pushing avro file to Kafka

I have an existing avro file and I want to push the file data into kafka but it's not working

/usr/bin/kafka-console-producer --broker-list test:9092 --topic test < part-m-00000.avro

Thanks

like image 656
bruce Avatar asked Jan 28 '23 16:01

bruce


2 Answers

You need to first download the avro-tools JAR file

Then get the schema from the file

java -jar avro-tools.jar getschema part-m-00000.avro > schema.avsc

Then install jq because it'll help in a minute format that schema file

Next, Avro messages in Kafka ideally should not contain the schema for every single record, so it would improve your overall topic throughput and network usage if you installed the Avro Schema Registry from Confluent (or the one from Hortonworks, but I've yet to try it).

After that's working, and you have the rest of the Confluent Platform downloaded, there's a script for producing Avro data, but to use it, you need JSON records from the Avro file. Use avro-tools again to get it

java -jar avro-tools.jar tojson part-m-00000.avro > records.json

Note - this output file will be significantly larger than the Avro file

Now, you're able to produce using the schema, which will be sent to the registry, and binary avro data into the topic, which is converted from applying the schema onto JSON records

bin/kafka-avro-console-producer \
         --broker-list localhost:9092 --topic test \
        --property schema.registry.url=http://localhost:8081 \
         --property value.schema="'$(jq -r tostring schema.avsc)'" < records.json

Note: Run jq -r tostring schema.avsc before this command, make sure it's an not an escaped JSON string


If that is output JSON file is too large, you might also be able to stream the avro-tools output into the producer

Replace

< records.json 

With

< $(java -jar avro-tools.jar tojson part-m-00000.avro)

Alternative solutions would include reading the Avro files in Spark, then forwarding those records to Kafka

like image 54
OneCricketeer Avatar answered Feb 05 '23 17:02

OneCricketeer


If you want to publish Avro Messages, you can try kafka-avro-console-producer.

$ ./bin/kafka-avro-console-producer \
             --broker-list localhost:9092 --topic test \
             --property value.schema='{"type":"record","name":"myrecord","fields":[{"name":"f1","type":"string"}]}'  < avrofile.avro

It is part of confluent open source package Please refer the more details here. https://docs.confluent.io/3.0.0/quickstart.html

P.S. Could not find the commands in latest version

like image 22
Nishu Tayal Avatar answered Feb 05 '23 17:02

Nishu Tayal