Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging two streams in Spark Streaming

Could you push me into right direction by the following question? (Even link to the documentation containing the required info would be appreciated.)

Is there any ability to merge multiple streams of data into stream of tuples.

E.g. we have stream A with elements (A1, t1), (A2, t2), ...(An, tn) and stream B with elements (B1, t1'), (B2, t2'), ... , (Bn, tn').

Where t is time of value (values are time series actually).

I would like to receive stream C with values

(A1", B1", t1"), ...,(An", Bn", tn")

Time from streams A and B could differ (that's why I am using ' and "). Metrics could be consumed in different time and by different rate. In such case, value with the latest to required time stamp must be taken while merging streams.

like image 632
Lastik Avatar asked May 08 '14 15:05

Lastik


1 Answers

You can use DStream.join. When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.

like image 86
Laeeq Avatar answered Dec 28 '22 09:12

Laeeq