Combine two streams in Apache Flink regardless on window time

Tags:

I have two data streams that I want to combine. The problem is that one data stream has a much higher frequency than the other and there are times where one stream is not receiving events at all. Is it possible to use the last event from the one stream and join it with the other stream on every event that is coming?

The only solution I found is using the join function, but you have to specify a common window, where you can apply the join function. This is window is not reached, when one stream is not receiving any events.

Is there a possibility to apply the join function on every event that is coming from either one stream or the other and maintain state of the last consumed event and use this event for the join function?

278

asked Sep 02 '17 14:09

FLoppix

1 Answers

There are many different approaches to combining or joining two streams in Flink, depending on requirements of each specific use case. When doing this "by hand", you want to be using Flink's ConnectedStreams with a RichCoFlatMapFunction or CoProcessFunction. Either of these will allow you to keep managed state (i.e. the last element from the infrequently updating stream), and join it with the faster stream. CoProcessFunction adds the ability to work with timers, which you should use to clear state for expired keys, if that's relevant.

There's an exercise on the Flink training site about different approaches for implementing such joins: Enrichment Joins. For a simpler example, see also the exercise about Expiring State.

Each recent release of Flink has included additional built-in join functions, so at this point it is less often necessary to roll your own. See the pages on joining with the DataStream API, joins with the Table API, and joins in SQL for more details.

171

answered Oct 12 '22 08:10

David Anderson

Related questions
                            
                                Why does Spark Planner prefer sort merge join over shuffled hash join?
                            
                                Aggregate for each day over time series, without using non-equijoin logic
                            
                                Join between in memory collection and EntityFramework
                            
                                How to do "(df1 & not df2)" dataframe merge in pandas?
                            
                                Oracle - update join - non key-preserved table
                            
                                Inner Join with derived table using sub query
                            
                                Padding Empty Field in Unix Join Operation
                            
                                How to store a one to many relation in MySQL database?
                            
                                SQL Server query - return null value if no match
                            
                                How do you perform basic joins of two RDD tables in Spark using Python?
                            
                                Difference between "and" and "where" in joins
                            
                                SQL join subquery
                            
                                ORACLE : Materialized view not working when Using LEFT JOIN
                            
                                Nice, clean cross join in Linq using only extension methods [duplicate]
                            
                                difference between ON Clause and using clause in sql
                            
                                Join Array from startIndex to endIndex
                            
                                linq to sql: join multiple columns from the same table
                            
                                Is there a way to perform a cross join or Cartesian product in excel?
                            
                                Nested INNER JOIN vs INNER JOIN vs WHERE: correctness, performance, clarity for a particular case (not a typical JOIN vs WHERE issue)
                            
                                Interesting SQL Join on dates between dates

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Combine two streams in Apache Flink regardless on window time

Tags:

join

stream

streaming

apache-flink

FLoppix

People also ask

1 Answers

David Anderson

Recent Activity

Donate For Us