Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Kafka + AWS lambda

Tags:

Is it possible to integrate AWS Lambda with Apache Kafka ? I want to put a consumer in a lambda function. When a consumer receive a message the lambda function execute.

like image 466
lolix Avatar asked Apr 03 '17 12:04

lolix


People also ask

Can AWS Lambda listen to Kafka topic?

AWS Lambda now supports Amazon Managed Streaming for Apache Kafka (Amazon MSK) as an event source, giving customers more choices to build serverless applications with streaming data. Customers can build Apache Kafka consumer applications with Lambda functions without needing to worry about infrastructure management.

Can Lambda publish to Kafka?

Records within different topic partitions, though, can be processed in parallel. If configured, the response from AWS Lambda can be written to a Kafka topic.

Can Kafka be used in AWS?

Learn more about Kafka on AWSAWS also offers Amazon MSK, the most compatible, available, and secure fully managed service for Apache Kafka, enabling customers to populate data lakes, stream changes to and from databases, and power machine learning and analytics applications.


2 Answers

Continuing the point by Arafat. We have successfully built an infrastructure to consume from Kafka using AWS Lambdas. Here are some gotcha's:

  • Make sure to consistently batch and commit while reading when consuming.
  • If you are storing the batches to s3, make sure to clean your file descriptors.
  • If you are forwarding the batches to another service make sure to clean the variables. Variable caching in AWS Lambda might result in memory overflows.
  • A good idea is to check how much time you have left while from the context object in the Lambda and give yourself some wiggle room to do something with the buffer you populated in your consumer which might not be read to a file unless you call close().

We are using Apache Airflow for scheduling. I hear cloudwatch can do that too.

like image 162
darthsidious Avatar answered Oct 08 '22 23:10

darthsidious


Here is AWS article on scheduled lambdas.

Given your Kafka installation will be running in a VPC, best practise is to configure your Lambda to run within the VPC as well - this will simplify the security group configuration for the EC2 instances running Kafka.

Here is the AWS blog article on configuring Lambdas to run in a VPC.

like image 27
Geoff Avatar answered Oct 09 '22 00:10

Geoff