So I have a pretty substantial program which is freezing sporadically.
This program uses Qt, open scene graph, and google logging. This freeze happens mid printing in the google logging. The program itself is printing a ton of debugging information. I was able to connect to the program via gdb-server and this is the stack trace:
#0 0x000000397ac0e030 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1 0x00007f5eecb74aeb in google::LogMessage::SendToLog() () from /lib64/libglog.so.0
#2 0x00007f5eecb71fc7 in google::LogMessage::Flush() () from /lib64/libglog.so.0
#3 0x00007f5eecb721a9 in google::LogMessage::~LogMessage() () from /lib64/libglog.so.0
#4 0x00000000004874a6 in LSDB::process (this=0x242d918, lsp=0x25f9200, circuit=0x24c7af0) at ../src/model/trill/LSDB.cpp:481
#5 0x00000000004a0f6f in Circuit::rx (this=0x24c7af0, eth=0x246fdf0) at ../src/model/trill/Circuit.cpp:355
#6 0x000000000045c950 in Circuit::qt_static_metacall (_o=0x24c7af0, _c=QMetaObject::InvokeMetaMethod, _id=0, _a=0x7fffaade95a0)
at ../src/model/trill/Circuit.moc.cpp:55
#7 0x000000398798cb9f in QMetaObject::activate (sender=0x2470140, m=<optimized out>, local_signal_index=<optimized out>, argv=0x7fffaade95a0)
at kernel/qobject.cpp:3547
#8 0x0000000000459610 in FramePublisher::subscription (this=0x2470140, _t1=0x246fdf0) at ../src/model/system/FramePublisher.moc.cpp:98
#9 0x000000000047c0d6 in FramePublisher::rx (this=0x2470140, frame=0x246fdf0) at ../src/model/system/FramePublisher.hpp:21
#10 0x000000000047bedb in EthernetPort::rx (this=0x246d7a0, frame=0x25a4180) at ../src/model/system/EthernetPort.cpp:81
#11 0x000000000045a208 in EthernetPort::qt_static_metacall (_o=0x246d7a0, _c=QMetaObject::InvokeMetaMethod, _id=1, _a=0x7fffaade9810)
at ../src/model/system/EthernetPort.moc.cpp:51
#12 0x000000398798cb9f in QMetaObject::activate (sender=0x246d7a0, m=<optimized out>, local_signal_index=<optimized out>, argv=0x7fffaade9810)
at kernel/qobject.cpp:3547
#13 0x0000000000459ddc in Port::rx (this=0x246d7a0, _t1=0x25a4180) at ../src/model/system/Port.moc.cpp:110
#14 0x00000000004803a6 in Port::inject (this=0x246d7a0, frame=0x25a4180) at ../src/model/system/Port.cpp:25
...
Notice the freeze itself is happening in __write_nocancel
. There is just a single thread running...
(gdb) info threads Id Target Id Frame
* 1 Thread 21507 0x000000397ac0e030 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:82
Any ideas on what is causing the freeze? Let me know what other information might help.
You said that the output is being written to stderr. If stderr is not being constantly read by a separate process, the I/O buffer may be full so it is waiting for stderr to be read before it writes more.
If process A is writing to stderr and process B normally reads from stderr but is currently waiting on process A, this could cause deadlock. In your case, it could only cause deadlock if the write to stderr is too large since a smaller write would succeed immediately and free process A, thus freeing process B to read from stderr.
This is some speculation on my part but in general I would assume that your issue is that you're waiting to write to a full buffer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With