I have one csv file in a folder that is keep on updating continuously. I need to take inputs from this csv file and produce some transactions. How can I take data from the csv file that is keep on updating , lets say every 5 minutes?
I have tried with following:
val csvDF = spark
.readStream
.option("sep", ",")
.schema(userSchema)
.csv("file:///home/location/testFiles")
but the issue is it is monitoring the folder that any new files has been created or not... but my issue is only one file that is keep on updating.
I have 1 csv file in 1 folder location that is keep on updating everytime. i need to take inputs from this csv file and produce some transactions. how can i take data from csv file that is keep on updating , lets say every 5 minutes.
tl;dr It won't work.
Spark Structured Streaming by default monitors files in a directory and for every new file triggers a computation. Once a file has been processed, the file will never be processed again. That's the default implementation.
You could write your own streaming source that could monitor a file for changes, but that's a custom source development (which in most cases is not worth the effort yet doable).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With