I would like to append data on a published dask dataset from a queue (like redis). Then other python programs would be able to fetch the latest data (e.g. once per second/minute) and do some futher opertions.
pd.DataFrame first or better use some text importer?Thanks for any tips and advice.
You have a few options here.
What are the assumed append speeds? Is it possible to append lets say 1k/10k rows in a second?
Dask is just tracking remote data. The speed of your application has a lot more to do with how you choose to represent that data (like python lists vs pandas dataframes) than with Dask. Dask can handle thousands of tasks a second. Each of those tasks could have a single row, or millions of rows. It's up to how you build it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With