Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to load from disk, process, then store data in a common hdf5 concurrently with python, pyqt, h5py?

Premise:

I've created a mainwindow. One of the drop down menu's has an 'ProcessData' item. When it's selected, I create a QProgressDialog. I then do a lot of processing in the main loop and periodically update the label and percentage in the QProgressDialog.

My processing looks like: read a large amount of data from a file (numpy memmapped array), do some signal processing, write the output to a common h5py file. I iterate over the available input files, and all of the output is stored in a common h5py hdf5 file. The entire process takes about two minutes per input file and pins one CPU to 100%.

Goal:

How do I make this process non-blocking, so that the UI is still responsive? I'd still like my processing function to be able to update the QProgressDialog and it's associated label.

Can I extend this to process more than one dataset concurrently and retain the ability to update the progressbar info?

Can I write into h5py from more than one thread/process/etc.? Will I have to implement locking on the write operation?

Software Versions:

I use python 3.3+ with numpy/scipy/etc. UI is in PyQt4 4.11/ Qt 4.8, although I'd be interested in solutions that use python 3.4 (and therefore asyncio) or PyQt5.

like image 869
troy.unrau Avatar asked Sep 28 '22 12:09

troy.unrau


1 Answers

This is quite a complex problem to solve, and this format is not really suited to providing complete answers to all your questions. However, I'll attempt to put you on the right track.

How do I make this process non-blocking, so that the UI is still responsive? I'd still like my processing function to be able to update the QProgressDialog and it's associated label.

To make it non-blocking, you need to offload the processing into a Python thread or QThread. Better yet, offload it into a subprocess that communicates progress back to the main program via a thread in the main program.

I'll leave you to implement (or ask another question on) creating subprocesses or threads. However, you need to be aware that only the MainThread can access GUI methods. This means you need to emit a signal if using a QThread or use QApplication.postEvent() from a python thread (I've wrapped the latter up into a library for Python 2.7 here. Python 3 compatibility will come one day)

Can I extend this to process more than one dataset concurrently and retain the ability to update the progressbar info?

Yes. One example would be to spawn many subprocesses. Each subprocess can be configured to send messages back to an associated thread in the main process, which communicates the progress information to the GUI via the method described for the above point. How you display this progress information is up to you.

Can I write into h5py from more than one thread/process/etc.? Will I have to implement locking on the write operation?

You should not write to a hdf5 file from more than one thread at a time. You will need to implement locking. I think possibly even read access should be serialised.

A colleague of mine has produced something along these lines for Python 2.7 (see here and here), you are welcome to look at it or fork it if you wish.

like image 102
three_pineapples Avatar answered Oct 22 '22 08:10

three_pineapples