Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Good practice when using kafka with jpa

I'm currently in a project where JPA and Kafka are used. I'm trying to find a set of good practice for combining those operations.

In the existing code, the producer is used in the same transaction as jpa, however, from what I have read, it seems that they don't share a transaction.

@PostMapping
@Transactional
public XDto createX(@RequestBody XRequest request) {
    Xdto dto = xService.create(request);
    kafkaProducer.putToQueue(dto, Type.CREATE);
    return dto;
}

where the kafka producer is defined as the following:

public class KafkaProducer {
    @Autowired
    private KafkaTemplate<String, Type> template;

    public void putToQueue(Dto dto, Type eventType) {
        template.send("event", new Event(dto, eventType));
    }
}

Is this a valid use case for combining jpa and kafka, are the transaction boundaries defined correctly?

like image 665
zibi Avatar asked Jun 26 '18 22:06

zibi


3 Answers

this would not work as intended when the transaction fails. kafka interaction is not part of transaction.

You may want to have a look at TransactionalEventListener You may want to write the message to kafka on the AFTER_COMMIT event. even then the kafka publish may fail.

Another option is to write to db using jpa as you are doing. Let debezium read the updated data from your database and push it to kafka. The event will be in a different format but far more richer.

like image 82
gagan singh Avatar answered Nov 18 '22 16:11

gagan singh


By looking at your question, I'm assuming that you are trying to achieve CDC (Change Data Capture) of your OLTP System, i.e. logging every change that is going to the transactional database. There are two ways to approach this.

  1. Application code does dual writes to transactional DB as well as Kafka. It is inconsistent and hampers the performance. Inconsistent, because when you make the dual write to two independent systems, the data gets screwed when either of the writes fails and pushing data to Kafka in transaction flow adds latency, which you don't want to compromise on.
  2. Extract changes from DB commit (either database/application-level triggers or transaction log) and send it to Kafka. It is very consistent and doesn't affect your transaction at all. Consistent because the DB commit logs are the reflections of the DB transactions after successful commits. There are a lot of solutions available which leverage this approach like databus, maxwell, debezium etc.

If CDC is your use case, try using any of the already available solutions.

like image 5
pushpavanthar Avatar answered Nov 18 '22 16:11

pushpavanthar


As others have said, you could use change data capture to safely propagate the changes applied to your database to Apache Kafka. You cannot update the database and Kafka in a single transaction as the latter doesn't support any kind of 2-phase-commit protocol.

You might either CDC the tables themselves, or, if you wish to have some more control about the structure sent towards Kafka, apply the "outbox" pattern. In that case, your application would write to its actual business tables as well as an "outbox" table which contains the messages to send to Kafka. You can find a detailed description of this approach in this blog post.

Disclaimer: I'm the author of this post and the lead of Debezium, one of the CDC solutions mentioned in some of the other answers.

like image 3
Gunnar Avatar answered Nov 18 '22 17:11

Gunnar