Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get stream of data from mqtt using python(pyspark) in spark version 2.2.0

I added the dependency in spark's POM.xml, given in the following link:

http://bahir.apache.org/docs/spark/current/spark-sql-streaming-mqtt/

Built spark using maven again. But as we can see it shows only Java and Scala supports to get data from mqtt.

I want to get stream data from mqtt in python. In earlier versions we had a pyspark.streaming.mqtt for the same. What is the similar to this in spark 2.2.0 pyspark. I am using mosquitto for mqtt broker.

like image 420
awisha Avatar asked Nov 07 '22 17:11

awisha


1 Answers

For PySpark you can use Structured Streaming bindings (you have to include Bahir jar):

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()  # type: SparkSession
(spark
    .readStream
    .format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
    .load("tcp://{}".format(broker_uri)))
like image 180
Alper t. Turker Avatar answered Nov 14 '22 21:11

Alper t. Turker