When I try to open /proc/net/tcp from a child POSIX thread in C++ it fails with a "No such file or directory" error. If I try to open it from the parent thread it succeeds every time, and the process of opening/closing it in the parent thread then makes it succeed about a third of the time in the child thread too. I can open /proc/uptime in the child thread 100% of the time without issue. Here's some example code which can be compiled with "g++ -Wall test.cc -o test -pthread":
#include <iostream>
#include <fstream>
#include <cstring>
#include <cerrno>
#include <pthread.h>
using namespace std;
void * open_test (void *)
{
ifstream in;
in.open("/proc/net/tcp");
if (in.fail())
cout << "Failed - " << strerror(errno) << endl;
else
cout << "Succeeded" << endl;
in.close();
return 0;
}
int main (int argc, char * argv[])
{
open_test(NULL);
pthread_t thread;
pthread_create(&thread, NULL, open_test, NULL);
pthread_exit(0);
}
I am running this on an Ubuntu 12.04 box with an Intel i5-2520M (2 cores * 2 virtual cores) on Linux kernel 3.2.0. Here is the output of me running the above code 6 times in a row:
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$
It's probably worth noting that I don't have this problem if I use fork instead of posix threads. If I use fork, then the child process has no problems reading /proc/net/tcp
Just a couple of data points to throw in.... It looks like this is a regression in Linux as 2.6.35 seems to work 100% of the time. 3.2.0 pukes most of the time even on my slow old Pentium M based laptop.
As scott points out in his answer, adding a pthread_join(thread, NULL)
fixes the symptoms. But why?
Let's put the program in gdb and set up a breakpoint at the point where the open has failed:
(gdb) break test.cc:14
Breakpoint 1 at 0x400c98: file test.cc, line 14.
Then we can observe two different types of behaviour:
(gdb) run […]
Succeeded
[New Thread 0x7ffff7fd1700 (LWP 18937)] // <- child thread
[Thread 0x7ffff7fd3740 (LWP 18934) exited] // <- parent thread
[Switching to Thread 0x7ffff7fd1700 (LWP 18937)]
Breakpoint 1, open_test () at test.cc:14
(gdb) run
Succeeded
[New Thread 0x7ffff7fd1700 (LWP 19427)] // <- child thread
Succeeded
[Thread 0x7ffff7fd1700 (LWP 19427) exited]
[Inferior 1 (process 19424) exited normally]
The first one suggests that the parent process exits before the child. As on Linux, processes and threads are pretty much the same, this implies that the PID associated with the main process gets cleaned up. Nothing hinders the child thread from running though. It and his pid are still perfectly valid. Just that /proc/self
points to the PID of the main process, which has been deleted at that point.
This behavior seems to be a kind of bug in the /proc
virtual filesystem. If you add this code just before opening the file:
system("ls -l /proc/net /proc/self/net/tcp");
You'll see that /proc/net
is a symbolic link to /proc/self/net
, and /proc/sec/net/tcp
is properly listed for both calls to open_test
, even when the spawned thread call fails.
Edit: I just realized the above test is bogus, since the self would refer to the shell process of the system call, not this process. Using the following function instead also reveals the bug:
void ls_command () {
ostringstream cmd;
cmd << "ls -l /proc/net "
<< "/proc/" << getpid()
<< "/net/tcp "
<< "/proc/" << syscall(SYS_gettid)
<< "/net/tcp";
system(cmd.str().c_str());
}
You'll see that the spawned thread will sometimes not be able to see the parents' /net/tcp
file. In fact it has disappeared, since this is the spawned shell's process that is running the ls
command.
The workaround below allows the child thread to reliably access what would be its /proc/net/tcp
.
My theory is that it is some kind of race condition bug with correctly setting up the As a test and work around, I modifed the /proc/self
entry for the thread as the proper blend of parent state and thread specific state.open_test
code to use the "process identifier" associated with the thread, rather than trying to access the parent process's (because /proc/self
refers to the parent process id, not the thread's).
Edit: As the evidence indicates, the bug has to do with the parent process cleaning up its /proc/self/...
state before the child thread has had a chance to read it. I still maintain this to be a bug, since the child thread is still technically part of the process. It's getpid()
is still the same before and after the main thread calls pthread_exit()
. The /proc
entry for the parent process should remain valid until all child threads are completed. Even though
Edit2: Jonas argues this may not be a bug. As evidence of that, there is this from man proc
:
/proc/[pid]/fd ... In a multithreaded process, the contents of this directory are not available if the main thread has already terminated (typi- ally by calling pthread_exit(3)).
But then consider this entry for /proc/self
in the same man
page entry:
/proc/self This directory refers to the process accessing the /proc file system, and is identical to the /proc directory named by the process ID of the same process.
If one is to believe this is not a bug because threads and processes are treated the same in Linux, then threads should have an expectation that /proc/self
will work. The bug may easily be fixed by modifying /proc/self
to change to use /proc/[gettid]
value when the /proc/[getpid]
version is no longer available, just as the workaround is doing below.
void * open_test (void *)
{
ifstream in;
string file = "/proc/net/tcp";
in.open(file.c_str());
if (in.fail()) {
ostringstream ss;
ss << "/proc/" << syscall(SYS_gettid) << "/net/tcp";
cout << "Can't access " << file
<< ", using " << ss.str() << " instead" << endl;
file = ss.str();
in.open(file.c_str());
}
if (in.fail())
cout << "Failed - " << strerror(errno) << endl;
else
cout << "Succeeded" << endl;
in.close();
return 0;
}
If you add a pthread_join(thread, NULL) call before the pthread_exit() call, your program will work correctly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With