I have a Static DataFrame
with millions of rows as follows.
Static DataFrame
:
--------------
id|time_stamp|
--------------
|1|1540527851|
|2|1540525602|
|3|1530529187|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------
Now in every batch, a Streaming DataFrame
is being formed which contains id and updated time_stamp after some operations like below.
In first Batch :
--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
--------------
Now in every batch, I want to update the Static DataFrame with the updated values of Streaming Dataframe like follows. How to do that?
Static DF after first batch :
--------------
id|time_stamp|
--------------
|1|1540527888|
|2|1540525999|
|3|1530529784|
|4|1520529185|
|5|1510529182|
|6|1578945709|
--------------
I've already tried except(), union() or 'left_anti' join. But it seems structured streaming doesn't support such operations.
So I resolved this issue by Spark 2.4.0 AddBatch method which coverts the streaming Dataframe into mini Batch Dataframes. But for the <2.4.0 version it's still a headache.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With