The question is just done for educational purpose:
Does the use of 30-50 or more pairs of signals and slots between two object (for example two threads) affect the application performance, runtime or response times?
First of all, you should probably not put any slots in QThreads. QThreads aren't really meant to be derived from other than by reimplementing the run
method and private methods (not signals!).
A QThread is conceptually a thread controller, not a thread itself. In most cases you should deal with QObjects. Start a thread, then move the object instance to that thread. That's the only way you'll get slots working correctly in the thread. Moving the thread instance (it is QObject-derived!) to the thread is a hack and bad style. Don't do that in spite of uninformed forum posts telling otherwise.
As to the rest of your question: a signal-slot call does not have to locate anything nor validate much. The "location" and "validation" is done when the connection is established. The main steps done at the time of the call are:
Locking a signal-slot mutex from a pool.
Iterating through the connection list.
Performing the calls using either direct or queued calls.
Common Cost
Any signal-slot call always starts as a direct call in the signal's implementation generated by moc. An array of pointers-to-arguments of the signal is constructed on the stack. The arguments are not copied.
The signal then calls QMetaObject::activate
, where the connection list mutex is acquired, and the list of connected slots is iterated, placing the call for each slot.
Direct Connections
Not much is done there, the slot is called by either directly calling QObject::qt_static_metacall
obtained at the time the connection was established, or QObject::qt_metacall
if the QMetaObject::connect
was used to setup the connection. The latter allows dynamic creation of signals and slots.
Queued Connections
The arguments have to marshalled and copied, since the call has to be stored in an event queue and the signal must return. This is done by allocating an array of pointers to copies, and copy-consting each argument on the heap. The code to do that is really no-frills plain old C.
The queuing of the call is done within queued_activate
. This is where the copy-construction is done.
The overhead of a queued call is always at least one heap allocation of QMetaCallEvent
. If the call has any arguments, then a pointers-to-arguments array is allocated, and an extra allocation is done for each argument. For a call with n
arguments, the cost given as a C expression is (n ? 2+n : 1)
allocations. A return value for blocking calls is counter as an argument. Arguably, this aspect of Qt could be optimized down to one allocation for everything, but in real life it'd only matter if you're calling trivial methods.
Benchmark Results
Even a direct (non-queued) signal-slot call has a measurable overhead, but you have to choose your battles. Ease of architecting the code vs. performance. You do measure performance of your final application and identify bottlenecks, do you? If you do, you're likely to see that in real-life applications, signal-slot overheads play no role.
The only time signal-slot mechanism has significant overhead is if you're calling trivial functions. Say, if you'd call the trivial
slot in the code below. It's a complete, stand-alone benchmark, so feel free to run it and see for yourself. The results on my machine were:
Warming up the caches...
trivial direct call took 3ms
nonTrivial direct call took 376ms
trivial direct signal-slot call took 158ms, 5166% longer than direct call.
nonTrivial direct signal-slot call took 548ms, 45% longer than direct call.
trivial queued signal-slot call took 2474ms, 1465% longer than direct signal-slot and 82366% longer than direct call.
nonTrivial queued signal-slot call took 2474ms, 416% longer than direct signal-slot and 653% longer than direct call.
What should be noted, perhaps, is that concatenating strings is quite fast :)
Note that I'm doing the calls via a function pointer, this is to prevent the compiler from optimizing out the direct calls to the addition function.
//main.cpp
#include <cstdio>
#include <QCoreApplication>
#include <QObject>
#include <QTimer>
#include <QElapsedTimer>
#include <QTextStream>
static const int n = 1000000;
class Test : public QObject
{
Q_OBJECT
public slots:
void trivial(int*, int, int);
void nonTrivial(QString*, const QString&, const QString&);
signals:
void trivialSignalD(int*, int, int);
void nonTrivialSignalD(QString*, const QString&, const QString &);
void trivialSignalQ(int*, int, int);
void nonTrivialSignalQ(QString*, const QString&, const QString &);
private slots:
void run();
private:
void benchmark(bool timed);
void testTrivial(void (Test::*)(int*,int,int));
void testNonTrivial(void (Test::*)(QString*,const QString&, const QString&));
public:
Test();
};
Test::Test()
{
connect(this, SIGNAL(trivialSignalD(int*,int,int)),
SLOT(trivial(int*,int,int)), Qt::DirectConnection);
connect(this, SIGNAL(nonTrivialSignalD(QString*,QString,QString)),
SLOT(nonTrivial(QString*,QString,QString)), Qt::DirectConnection);
connect(this, SIGNAL(trivialSignalQ(int*,int,int)),
SLOT(trivial(int*,int,int)), Qt::QueuedConnection);
connect(this, SIGNAL(nonTrivialSignalQ(QString*,QString,QString)),
SLOT(nonTrivial(QString*,QString,QString)), Qt::QueuedConnection);
QTimer::singleShot(100, this, SLOT(run()));
}
void Test::run()
{
// warm up the caches
benchmark(false);
// do the benchmark
benchmark(true);
}
void Test::trivial(int * c, int a, int b)
{
*c = a + b;
}
void Test::nonTrivial(QString * c, const QString & a, const QString & b)
{
*c = a + b;
}
void Test::testTrivial(void (Test::* method)(int*,int,int))
{
static int c;
int a = 1, b = 2;
for (int i = 0; i < n; ++i) {
(this->*method)(&c, a, b);
}
}
void Test::testNonTrivial(void (Test::* method)(QString*, const QString&, const QString&))
{
static QString c;
QString a(500, 'a');
QString b(500, 'b');
for (int i = 0; i < n; ++i) {
(this->*method)(&c, a, b);
}
}
static int pct(int a, int b)
{
return (100.0*a/b) - 100.0;
}
void Test::benchmark(bool timed)
{
const QEventLoop::ProcessEventsFlags evFlags =
QEventLoop::ExcludeUserInputEvents | QEventLoop::ExcludeSocketNotifiers;
QTextStream out(stdout);
QElapsedTimer timer;
quint64 t, nt, td, ntd, ts, nts;
if (!timed) out << "Warming up the caches..." << endl;
timer.start();
testTrivial(&Test::trivial);
t = timer.elapsed();
if (timed) out << "trivial direct call took " << t << "ms" << endl;
timer.start();
testNonTrivial(&Test::nonTrivial);
nt = timer.elapsed();
if (timed) out << "nonTrivial direct call took " << nt << "ms" << endl;
QCoreApplication::processEvents(evFlags);
timer.start();
testTrivial(&Test::trivialSignalD);
QCoreApplication::processEvents(evFlags);
td = timer.elapsed();
if (timed) {
out << "trivial direct signal-slot call took " << td << "ms, "
<< pct(td, t) << "% longer than direct call." << endl;
}
timer.start();
testNonTrivial(&Test::nonTrivialSignalD);
QCoreApplication::processEvents(evFlags);
ntd = timer.elapsed();
if (timed) {
out << "nonTrivial direct signal-slot call took " << ntd << "ms, "
<< pct(ntd, nt) << "% longer than direct call." << endl;
}
timer.start();
testTrivial(&Test::trivialSignalQ);
QCoreApplication::processEvents(evFlags);
ts = timer.elapsed();
if (timed) {
out << "trivial queued signal-slot call took " << ts << "ms, "
<< pct(ts, td) << "% longer than direct signal-slot and "
<< pct(ts, t) << "% longer than direct call." << endl;
}
timer.start();
testNonTrivial(&Test::nonTrivialSignalQ);
QCoreApplication::processEvents(evFlags);
nts = timer.elapsed();
if (timed) {
out << "nonTrivial queued signal-slot call took " << nts << "ms, "
<< pct(nts, ntd) << "% longer than direct signal-slot and "
<< pct(nts, nt) << "% longer than direct call." << endl;
}
}
int main(int argc, char *argv[])
{
QCoreApplication a(argc, argv);
Test t;
return a.exec();
}
#include "main.moc"
Ofcourse they affect application performance, mainly due to the time spent over locating the connection object+ validating the slot object state n so .But the simplicity and flexibility of the signals and slots mechanism is well worth the overhead.Plus one of the major advantage of signal-slot mechanism is they are type=safe allowing communication between objects, irrespective of type of object unlike callbacks.
Compared to callbacks, signals and slots are slightly slower because of the increased flexibility they provide, although the difference for real applications is insignificant. In general, emitting a signal that is connected to some slots, is approximately ten times slower than calling the receivers directly, with non-virtual function calls. This is the overhead required to locate the connection object, to safely iterate over all connections (i.e. checking that subsequent receivers have not been destroyed during the emission), and to marshall any parameters in a generic fashion. While ten non-virtual function calls may sound like a lot, it's much less overhead than any new or delete operation, for example. As soon as you perform a string, vector or list operation that behind the scene requires new or delete, the signals and slots overhead is only responsible for a very small proportion of the complete function call costs.
Source:Signals and Slots
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With