Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Qt 4.7: TCP thread, data transfer causes memory leak

Tags:

c++

sockets

qt

I have solved this problem myself and the bounty will not be awarded. The problem arose as a consequence of a GUI operation that was being initiated by a non-GUI thread.

Qt 4.7 OSX 10.6.8

There's a lot of code in the app, but not a whole lot involved with what's going on.

The data memory leak occurs in the context of a single connection, which is opened, read, written and closed within a single Qt thread. I'm using a fixed memory object (pMsg) to hold my messages, then sending them to the external device like this:

m_pTcpSocket->write((char*)pMsg->Buf8, (qint64)pMsg->GetLength());

Buf8 is a 2048 byte static array. GetLength is the first 16 bits of the message and'ed against 0xFF, so a number from 0 to 255. Should return 4 for these messages, always has in my diagnostics. Both operations are surrounded by their own mutexes (meaning, different mutexes.)The message lengths are typically 4 bytes. The messages dependably get to the receiving device elsewhere on our wired LAN; they're correct when they arrive and the device responds appropriately with an ACK specific to only those messages. I've tried adding a call to flush() afterwards; doesn't help (nor should there be anything to flush, but...) I don't know that the leak is in the write().

Sending these messages in turn causes me to receive an ACK message from the device. I read it like this:

if (m_pTcpSocket->waitForReadyRead(100))
{
    while ((bytesavailable = m_pTcpSocket->bytesAvailable()))
    {
        m_pTcpSocket->read(RBuf, bytesavailable);
        AssembleMsg(Buf, bytesavailable); // state machine empties Buf
    }
}

After the loop, bytesavailable is zero (of course.) Buf is an unsigned char pointer to 2048 static array of unsigned chars upon which, after each portion of data is received, I run a simple state machine that assembles the messages. Message lengths are 4. Messages are received and assembled as expected, no memory allocations are made, nor objects declared. Both operations are surrounded by their own mutexes (meaning, different mutexes so they can't interact between rx and tx.) Once the message is assembled, all it does is reset a counter that sets the delay to the next keepalive message (which is what these are. without them, the device will drop the connection.) The delay is accumulated by counting after the waitforreadyread(100), which counts intervals of that length as long as the device sends nothing to this port, which is typical behavior. In this way, no timer is required. The timing works fine. Messages are read as soon as they arrive, or at least, within 100 ms. They don't accumulate. So I thought the read buffer would not get larg(er). But... I don't know. Something is getting larger!

So that's the read. But I don't know that the leak is in the read(), either.

BUT it HAS to be one or the other. If I don't send these messages (which means I don't get the ACK messages, either), then there is no leak. Nothing else changes anywhere in the application. This is the mode it powers up in, and no other activity is going on, I'm just keeping the connection open so when it's time to run the radio, the port is ready to go.

Both of these run in the same thread, and they both run off of the same socket. The thread runs continuously, and the same socket remains open (for hours, in fact.) So it's not a socket object delete issue.

The problem is exacerbated with certain brands of SDR radios, as they require the keepalive during receive operation, which means the app sits there and chews up memory like crazy when receiving as WELL as when it is sitting there just waiting to go.

I'm losing about 250 megabytes in approximately 12 hours, in chunks somewhere under 100k. I can watch the app memory increase.1 mb at a time, about once a second.

I have googled extensively, and all I can find talks about is failing to delete the tcp object over multiple connections, which is definitely not the issue here.

I'm really at a loss. Is the problem related to my use of the socket in a thread? The application (a very complex software defined radio app) runs anywhere from 10 to 16 threads, depending on what it's doing; I run the data transfers in their own thread so they aren't compromised by anything that ties up the main event loop.

I've tried valgrind, but it terminates the app a bit after it tries to start it, well before any of this gets going. I don't think it likes threading, or something. Or maybe it's 10.6.8, but anyway, it doesn't work. Qt 4.7 doesn't integrate it anyway. I know of no way to track memory use from within the application so that I could wrap each send and receive and at least figure out which one (or both?) is responsible.

*** edit: By changing the rate of the keepalive message, I directly change the rate of the memory leak, and as I think I said above, if the keepalive isn't being sent, there's no memory loss at all.

That's all I can think of to tell you folks; any suggestions are welcome, any illumination about TCP quirks in Qt would be welcome, basically anything. I've spent many days on this and I'm just stonewalled at this juncture.

like image 283
fyngyrz Avatar asked Oct 31 '14 22:10

fyngyrz


2 Answers

I found it. Drawing from a non-gui thread was breaking Qt in a very indirect way. Stopped doing that, and it stopped leaking. Thanks everyone.

It is @Shf who deserves the credit, but sadly, I didn't really understand bounties that well and I probably told him to get in here and answer too late. I will make it up to him -- when he gets my message -- by offering a bounty on the question where he actually provided the critical hint. The bounty will consist of the rest of my stack overflow rep, including what's been earned by this question. Best I can do for now; I'll know better next time. It's definitely been educational.

like image 69
fyngyrz Avatar answered Sep 17 '22 14:09

fyngyrz


Not really enough to work on in terms of code, but I'd look at these things:-

  • How do you know you have a memory leak?
  • How do you know it's not actually heap corruptions
  • There's not a 'new' or 'delete' anywhere in sight. if you're not using them, then the 'leak' is likely in the TCP handling.
  • Sockets : Try closing this and re-opening every so often. Does the leak get cleaned when you do that?
  • You read into RBuf but then assemble from Buf ...?
  • What type is RBuf? Why no bounds checking on the amount you read into it?
  • Wireshark - Look at what's being sent/received on your socket - anything unusual going on there. Or, anything going to OTHER sockets.
  • Are you actually reading the bytes from the socket? Check the return value from read, and see this question.
like image 24
Roddy Avatar answered Sep 21 '22 14:09

Roddy