I am working on a project, where I want to perform data acquisition, data processing and GUI visualization (using pyqt with pyqtgraph) all in Python. Each of the parts is in principle implemented, but the different parts are not well separated, which makes it difficult to benchmark and improve performance. So the question is:
Is there a good way to handle large amounts of data between different parts of a software?
I think of something like the following scenario:
When I say "large amounts of data", I mean that I get arrays with approximately 2 million data points (16bit) per second that need to be processed and possibly also stored.
Is there any framework for Python that I can use to handle this large amount of data properly? Maybe in form of a data-server that I can connect to.
In other words, are you acquiring so much data that you cannot keep all of it in memory while you need it?
For example, there are some measurements that generate so much data, the only way to process them is after-the-fact:
If your computer system is able to keep pace with the generation of data, you can use a separate Python queue between each stage.
If your measurements are creating more data than your system can consume, then you should start by defining a few tiers (maybe just two) of how important your data is:
One analogy might be a video stream...
- lossless -- gold-masters for archival
- lossy -- YouTube, Netflix, Hulu might drop a few frames, but your experience doesn't significantly suffer
From your description, the Acquisition and Processing must be lossless, while the GUI/visualization can be lossy.
For lossless data, you should use queues. For lossy data, you can use deques.
Regardless of your data container, here are three different ways to connect your stages:
It seems like you just need a 1-1 relationship between each stage, so a producer-consumer design looks like it will suit your application.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With