Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does Kafka python API support stream processing?

I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?

like image 225
user3126637 Avatar asked Aug 19 '18 14:08

user3126637


People also ask

Can Kafka be used for stream processing?

Kafka Streams is a client library for processing and analyzing data stored in Kafka. It builds upon important stream processing concepts such as properly distinguishing between event time and processing time, windowing support, and simple yet efficient management and real-time querying of application state.

How do I use Kafka Streams in python?

Run Kafka EnvironmentStep 1: Open a command prompt window. Step 2: Go to the Kafka root folder and then run the next commands to start the services required. Step 3: Run the below command to go to the required folder. Step 4: Start the Zookeeper server using the below command.

Which API can be used to stream messages from a Kafka topic for processing?

Producer API: This API allows an application to publish a stream of records to one or more Kafka topics. Consumer API: Consumer API allows applications to connect to one or more topics and process the records as they are pushed to those topics.

Does Kafka have python API?

There is no such Kafka Stream API yet in Python, but a good alternative would be Faust. The testing in this section is executed based on 1 Zookeeper and 1 Kafka broker installed locally.


1 Answers

Kafka Streams is only available as a JVM library, but there are at least two Python implementations of it

  • robinhood/faust (Not maintained as of 2020, but was forked)
  • wintincode/winton-kafka-streams (appears not to be maintained)

In theory, you could try playing with Jython or Py4j to work with the JVM implementation, but probably would require more work than necessary.

Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out (and also require a Java installation).

If you are okay with HTTP methods, then running a KSQLDB instance (again, requiring Java for that server) and invoking its REST interface from Python with the built-in SQL functions can work. However, building your own functions there will requiring writing Java code, last I checked.

If none of those options are suitable, then you're stuck with the basic consumer/producer methods.

like image 66
OneCricketeer Avatar answered Oct 01 '22 17:10

OneCricketeer