Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Slow QTcpServer with lots of simultaneous clients

I'm writing TCP server in Qt that will serve large files. Application logic is as follows:

  1. I've subclassed QTcpServer and reimplemented incomingConnection(int)
  2. In incomingConnection, I'm creating instance of "Streamer" class
  3. "Streamer" is using QTcpSocket which is initialized with setSocketDescriptor from incomingConnection
  4. When data from client arrives, I'm sending back initial response from within readyRead() slot, and then I'm connecting socket's signal bytesWritten(qint64) to Streamer's slot bytesWritten()

bytesWritten looks something like:

Streamer.h:
...
private:
    QFile *m_file;
    char m_readBuffer[64 * 1024];
    QTcpSocket *m_socket;
...

Streamer.cpp
...
void Streamer::bytesWritten() {
    if (m_socket->bytesToWrite() <= 0) {
        const int bytesRead = m_file->read(m_readBuffer, 64 * 1024);
        m_socket->write(m_readBuffer, bytesRead);   
    }
}
...

So basically I'm only writing new data when all pending data is fully written. I think that is the most asynchronous way of doing that.

And everything works correct, except it's pretty slow when there are lots of simultaneous clients.

With about 5 clients - I'm downloading from that server with speed around 1 MB/s (max of my home internet connection)

With about 140 clients - download speed is around 100-200 KB/s.

Server's internet connection is 10 Gbps and with 140 clients its use is around 100 Mbps, so I don't think that is the problem.

Server's memory usage with 140 clients - 100 MB of 2GB available

Server's CPU usage - max 20%

I'm using port 800.

When there were 140 clients on port 800 and download speed through it was like 100-200 KB/s, I've run separate copy on port 801 and was downloading at 1 MB/s without problems.

My guess is that somehow, Qt's event dispatching (or socket notifiers?) is too slow to handle all those events.

I've tried:

  1. Compiling whole Qt and my app with -O3
  2. Installing libglib2.0-dev and recompiling Qt (because QCoreApplication uses QEventDispatcherGlib or QEventDispatcherUNIX, so I wanted to see if there's any difference)
  3. Spawning a few threads and in incomingConnection(int) using streamer->moveToThread() depending of how much clients are currently in particular thread - that didn't make any change (though I've observed that speeds were much more varying)
  4. Spawning worker processes using

Code:

main.cpp:
#include <sched.h>

int startWorker(void *argv) {
    int argc = 1;
    QCoreApplication a(argc, (char **)argv);

    Worker worker;
    worker.Start();

    return a.exec();
}

in main():
...
long stack[16 * 1024]; 
clone(startWorker, (char *)stack + sizeof(stack) - 64, CLONE_FILES, (void *)argv);

and then starting a QLocalServer in main process and passing socketDescriptors from incomingConnection(int socketDescriptor) to the worker processes. It worked correctly, but download speeds were still slow.

Also tried:

  1. fork()-ing process in incomingConnection() - that nearly killed the server :)
  2. Creating separate thread for each client - speeds dropped to 50-100 KB/s
  3. Using QThreadPool with QRunnable - no difference

I'm using Qt 4.8.1

I ran out of ideas.

Is it Qt-related or maybe something with the server configuration?

Or maybe I should use different language/framework/server? I need TCP server that will serve files, but I also need to perform some specific tasks between packets, so I need to implement that part myself.

like image 920
AdrianEddy Avatar asked Oct 23 '22 01:10

AdrianEddy


1 Answers

Your disk reads are blocking operations, they will stop any processing, including handling of new network connections and such. Your disk has finite I/O throughput, too, and you can saturate that. You probably don't want your disk to stop the rest of your application. I don't think there's anything wrong with Qt here -- not until you'd run a profiler and show that Qt's CPU consumption is excessive, or that somehow Qt hits lock contention on event queues (those are the only ones that would matter here).

You should have your processing split across QObjects, as follows:

  1. Accepting incoming connections.

  2. Handling writing and reading from the sockets.

  3. Processing the incoming network data and issuing any non-file replies.

  4. Reading from disk and writing to the network.

Of course #1 and #2 are existing Qt classes.

You have to write #3 and #4. You can probably move #1 and #2 into one thread shared between them. #3 and #4 should be spread around a number of threads. An instance of #3 should be created for each active connection. Then, when time comes for sending file data, #3 instantiates #4. The number of threads available for #4 should be adjustable, you will probably find that there's an optimal setting for it for a particular workload. You can instantiate #3 and #4 across the their threads in a round-robin fashion. Since disk access is blocking, the threads used for #4 should be exclusive and not used for anything else.

The #4 object should do disk reads when there's less than a certain amount of data left in the write buffer. This amount probably shouldn't be zero -- you want to keep those network interfaces busy at all times, if possible, and running out of data to send is one surefire way to idle them.

So I see at least the following tunable parameters that you will need to benchmark for:

  1. minNetworkWatermark - Minimum water level in the socket transmit buffer. You read from disk and write to the socket when there's less than that many bytes to write.

  2. minReadSize - Size of a minimal disk read. A file read would be of qMax(minNetworkWatermark - socket->bytesToWrite(), minReadSize).

  3. numDiskThreads - number of threads that #4 objects get moved to.

  4. numNetworkThreads - number of threads that #3 objects get moved to.

You will want to benchmark on different machines to get an idea how fast things can go and what is the effect of tuning. Start the benchmarks from your development machine, whether desktop or notebook. Since it's your daily workhorse, you'd probably notice quickly if there was something wrong with its performance.

like image 190
Kuba hasn't forgotten Monica Avatar answered Oct 27 '22 11:10

Kuba hasn't forgotten Monica