How to look up and update the state of a record from a database in Apache Flink?

Tags:

I am working on a data streaming application and I am investigating the possibility of using Apache Flink for this project. The main reason for this is that it supports nice high-level streaming constructs, very similar to Java 8's Stream API.

I will be receiving events which correspond to a specific record in a database, and I want to be able to process these events (coming from a message broker such as RabbitMQ or Kafka) and eventually update the records in the database and push the processed/transformed events to another sink (probably another message broker).

Events related to a specific record will ideally need to be processed in FIFO ordering (although there will be a timestamp which helps detect out of order events too), but events related to different records can be processed in parallel. I was planning to use the keyBy() construct to partition the stream by record.

The processing that needs to be done depends on the current information in the database about the record. However, I am unable to find an example or recommended approach to query a database for such records to enrich the event that it is being processed with the additional information I need to process it.

The pipeline I have in mind is as follows:

-> keyBy() on the id received -> retrieve the record from the database corresponding to the id -> perform processing steps on the record -> push the processed event to an external queue and update the database record

The database record will need to be updated because another application will be querying the data.

There might be additional optimisations one could do after this pipeline is achieved. For example one could cache the (updated) record in a managed state so that the next event on the same record will not need another database query. However, if the application does not know about a specific record it will need to retrieve it from the database.

What is the best approach to use for this kind of scenario in Apache Flink?

868

asked Aug 10 '16 06:08

jbx

1 Answers

You can perform database look up by extending a rich function for e.g. a RichFlatMap function, initialize the database connection once in its open() method and then process each event in the flatMap() method:

public static class DatabaseMapper extends RichFlatMapFunction<Event, EncrichedEvent> {

    // Declare DB coonection and query statements

    @Override
    public void open(Configuration parameters) throws Exception {
      // Initialize Database connection
      // Prepare Query statements
    }

    @Override
    public void flatMap(Event currentEvent, Collector<EncrichedEvent> out) throws Exception {
      // look up the Database, update record, enrich event
      out.collect(enrichedEvent);        
    }
})

And then you can use DatabaseMapper as follows:

stream.keyby(id)
      .flatmap(new DatabaseMapper())
      .addSink(..);

You can find here an example using cached data from Redis.

answered Oct 12 '22 02:10

Yassine Marzougui

Related questions
                            
                                How to take screenshot of WHOLE activity page programmatically? [closed]
                            
                                Obfuscating WAR file with Proguard
                            
                                How to cancel AsyncRestTemplate HTTP request if they are taking too much time?
                            
                                Java Thread.sleep() on Windows 10 stops in S3 sleep status
                            
                                Error creating bean with name 'liquibase' defined in class path resource ... /config/DatabaseConfiguration.class
                            
                                @CompoundIndex not working in Spring Data MongoDB
                            
                                Why do heap memory usage and number of loaded classes keep increasing?
                            
                                Is throwing an exception inside a finally block a performance issue?
                            
                                Hibernate trying to persist same object twice
                            
                                Is Apache Kafka able to handle transactions?
                            
                                Specifying an exception-specific backoff policy with Spring-Retry
                            
                                Optimizing Opportunities with Java Streams
                            
                                Using Java Void type in Kotlin
                            
                                How to initialize the data for each invocation in JMH?
                            
                                When two interfaces have conflicting return types, why does one method become default?
                            
                                Make the java compiler warn when an annotated method is used (like @deprecated)
                            
                                Android AutoCompleteTextView popup moving after displaying
                            
                                What's the difference between a Key and a KeySpec?
                            
                                What are Thread Groups in Java?
                            
                                How to make custom implementation of Retrofit2.Call<T>

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to look up and update the state of a record from a database in Apache Flink?

Tags:

java

stream

apache-flink

jbx

People also ask

1 Answers

Yassine Marzougui

Recent Activity

Donate For Us