Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I asynchronously load data from large files in Qt?

I'm using Qt 5.2.1 to implement a program that reads in data from a file (could be a few bytes to a few GB) and visualises that data in a way that's dependent on every byte. My example here is a hex viewer.

One object does the reading, and emits a signal dataRead() when it's read a new block of data. The signal carries a pointer to a QByteArray like so:

filereader.cpp

void FileReader::startReading()
{

    /* Object state code here... */

        {
            QFile inFile(fileName);

            if (!inFile.open(QIODevice::ReadOnly))
            {
                changeState(STARTED, State(ERROR, QString()));
                return;
            }

            while(!inFile.atEnd())
            {
                QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
                qDebug() << "emitting dataRead()";
                emit dataRead(qa);
            }
        }

    /* Emit EOF signal */

}

The viewer has its loadData slot connected to this signal, and this is the function that displays the data:

hexviewer.cpp

void HexViewer::loadData(QByteArray *data)
{
    QString hexString = data->toHex();

    for (int i = 0; i < hexString.length(); i+=2)
    {
        _ui->hexTextView->insertPlainText(hexString.at(i));
        _ui->hexTextView->insertPlainText(hexString.at(i+1));
        _ui->hexTextView->insertPlainText(" ");
    }

    delete data;
}

The first problem is that if this is just run as-is, the GUI thread will become completely unresponsive. All of the dataRead() signals will be emitted before the GUI is ever redrawn.

(The full code can be run, and when you use a file bigger than about 1kB, you will see this behaviour.)

Going by the response to my forum post Non-blocking local file IO in Qt5 and the answer to another Stack Overflow question How to do async file io in qt?, the answer is: use threads. But neither of these answers go into any detail as to how to shuffle the data itself around, nor how to avoid common errors and pitfalls.

If the data was small (of the order of a hundred bytes) I'd just emit it with the signal. But in the case the file is GB in size (edit) or if the file is on a network-based filesystem eg. NFS, Samba share, I don't want the UI to lock up just because reading the file blocks.

The second problem is that the mechanics of using new in the emitter and delete in the receiver seems a bit naive: I'm effectively using the entire heap as a cross-thread queue.

Question 1: Does Qt have a better/idiomatic way to move data across threads while limiting memory consumption? Does it have a thread safe queue or other structures that can simplify this whole thing?

Question 2: Does I have to implement the threading etc. myself? I'm not a huge fan of reinventing wheels, especially regarding memory management and threading. Are there higher level constructs that can already do this, like there are for network transport?

like image 657
detly Avatar asked Jan 03 '16 00:01

detly


2 Answers

First of all, you don't have any multithreading in your app at all. Your FileReader class is a subclass of QThread, but it does not mean that all FileReader methods will be executed in another thread. In fact, all your operations are performed in the main (GUI) thread.

FileReader should be a QObject and not a QThread subclass. Then you create a basic QThread object and move your worker (reader) to it using QObject::moveToThread. You can read about this technique here.

Make sure you have registered FileReader::State type using qRegisterMetaType. This is necessary for Qt signal-slot connections to work across different threads.

An example:

HexViewer::HexViewer(QWidget *parent) :
    QMainWindow(parent),
    _ui(new Ui::HexViewer),
    _fileReader(new FileReader())
{
    qRegisterMetaType<FileReader::State>("FileReader::State");

    QThread *readerThread = new QThread(this);
    readerThread->setObjectName("ReaderThread");
    connect(readerThread, SIGNAL(finished()),
            _fileReader, SLOT(deleteLater()));
    _fileReader->moveToThread(readerThread);
    readerThread->start();

    _ui->setupUi(this);

    ...
}

void HexViewer::on_quitButton_clicked()
{
    _fileReader->thread()->quit();
    _fileReader->thread()->wait();

    qApp->quit();
}

Also it is not necessary to allocate data on the heap here:

while(!inFile.atEnd())
{
    QByteArray *qa = new QByteArray(inFile.read(DATA_SIZE));
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}

QByteArray uses implicit sharing. It means that its contents are not copied again and again when you pass a QByteArray object across functions in a read-only mode.

Change the code above to this and forget about manual memory management:

while(!inFile.atEnd())
{
    QByteArray qa = inFile.read(DATA_SIZE);
    qDebug() << "emitting dataRead()";
    emit dataRead(qa);
}

But anyway, the main problem is not with multithreading. The problem is that QTextEdit::insertPlainText operation is not cheap, especially when you have a huge amount of data. FileReader reads file data pretty quickly and then floods your widget with new portions of data to display.

It must be noted that you have a very ineffectual implementation of HexViewer::loadData. You insert text data char by char which makes QTextEdit constantly redraw its contents and freezes the GUI.

You should prepare the resulting hex string first (note that data parameter is not a pointer anymore):

void HexViewer::loadData(QByteArray data)
{
    QString tmp = data.toHex();

    QString hexString;
    hexString.reserve(tmp.size() * 1.5);

    const int hexLen = 2;

    for (int i = 0; i < tmp.size(); i += hexLen)
    {
        hexString.append(tmp.mid(i, hexLen) + " ");
    }

    _ui->hexTextView->insertPlainText(hexString);
}

Anyway, the bottleneck of your application is not file reading but QTextEdit updating. Loading data by chunks and then appending it to the widget using QTextEdit::insertPlainText will not speed up anything. For files less than 1Mb it is faster to read the whole file at once and then set the resulting text to the widget in a single step.

I suppose you can't easily display huge texts larger than several megabytes using default Qt widgets. This task requires some non-trivial approch that in general has nothing to do with multithreading or asynchronous data loading. It's all about creating some tricky widget which won't try to display its huge contents at once.

like image 88
hank Avatar answered Nov 13 '22 03:11

hank


This seems like the case that you would want to have a consumer producer with semaphores. There is a very specific example which can walk you through properly implementing it. You need one more thread to make this work apart from your main thread.

The setup should be :

  • Thread A runs your filereader as a producer
  • You GUI thread runs your Hexviewer widget that consumes your data on specific events. Before issuing QSemaphore::acquire() a check with QSemaphore::available()` should be made in order to avoid blocking the GUI.
  • Filereader and Hexviewer have access to a third class e.g. DataClass where the data is placed upon read and retrieved from the consumer. This should also have the semaphores defined.
  • There is no need to emit a signal with the data or notify.

That pretty much covers moving your data read from filereader to your widget but it does not cover how to actually paint this data. In order to achive this you can consume the data within a paintevent by overriding the paint event of Hexviewer, and reading what has been put in the queue. A more elaborate approach would be to write an event filter.

On top of this you may want to have a maximum number of bytes read after which Hexviewer is explicitly signaled to consume the data.

Notice, that this solution is completely asynchronous, threadsafe and ordered, since none of your data is sent to Hexviewer, but the Hexviewer only consumes that when it needs to display on the screen.

like image 2
g24l Avatar answered Nov 13 '22 03:11

g24l