I am not able to use the kafka library in the databricks notebook.
getting the error
ImportError: No module named 'kafka'
from kafka import KafkaProducer
def send_to_kafka(rows):
producer = KafkaProducer(bootstrap_servers = "localhost:9092")
for row in rows:
producer.send('topic', str(row.asDict()))
producer.flush()
df.foreachPartition(send_to_kafka)
databricks cluster info
spark version - 2.4.3
scala version - 2.11
Help me. Thanks in advance.
Instead of doing this - it's very inefficient, just use kafka connector to write data, like this (you need first convert data into JSON string):
from pyspark.sql.functions import to_json, struct
df.select(to_json(struct("*")).alias("value"))\
.write.format("kafka")\
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")\
.option("topic", "topic1")\
.save()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With