Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update a Static Dataframe with Streaming Dataframe in Spark structured streaming

I have a Static DataFrame with millions of rows as follows.

Static DataFrame :

--------------
id|time_stamp|
--------------
|1|1540527851|
|2|1540525602|
|3|1530529187|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------

Now in every batch, a Streaming DataFrame is being formed which contains id and updated time_stamp after some operations like below.

In first Batch :

--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
--------------

Now in every batch, I want to update the Static DataFrame with the updated values of Streaming Dataframe like follows. How to do that?

Static DF after first batch :

--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------

I've already tried except(), union() or 'left_anti' join. But it seems structured streaming doesn't support such operations.

like image 932
Swarup Avatar asked Oct 17 '22 11:10

Swarup


1 Answers

So I resolved this issue by Spark 2.4.0 AddBatch method which coverts the streaming Dataframe into mini Batch Dataframes. But for the <2.4.0 version it's still a headache.

like image 164
Swarup Avatar answered Oct 23 '22 07:10

Swarup