Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Opening /proc/net/tcp in C++ from a POSIX thread fails most of the time

When I try to open /proc/net/tcp from a child POSIX thread in C++ it fails with a "No such file or directory" error. If I try to open it from the parent thread it succeeds every time, and the process of opening/closing it in the parent thread then makes it succeed about a third of the time in the child thread too. I can open /proc/uptime in the child thread 100% of the time without issue. Here's some example code which can be compiled with "g++ -Wall test.cc -o test -pthread":

#include <iostream>
#include <fstream>
#include <cstring>
#include <cerrno>
#include <pthread.h>

using namespace std;

void * open_test (void *)
{
    ifstream in;
    in.open("/proc/net/tcp");
    if (in.fail())
        cout << "Failed - " << strerror(errno) << endl;
    else
        cout << "Succeeded" << endl;
    in.close();

    return 0;
}

int main (int argc, char * argv[])
{
    open_test(NULL);

    pthread_t thread;
    pthread_create(&thread, NULL, open_test, NULL);
    pthread_exit(0);
}

I am running this on an Ubuntu 12.04 box with an Intel i5-2520M (2 cores * 2 virtual cores) on Linux kernel 3.2.0. Here is the output of me running the above code 6 times in a row:

mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$ ./test
Succeeded
Succeeded
mike@ung:/tmp$ ./test
Succeeded
Failed - No such file or directory
mike@ung:/tmp$

It's probably worth noting that I don't have this problem if I use fork instead of posix threads. If I use fork, then the child process has no problems reading /proc/net/tcp

Just a couple of data points to throw in.... It looks like this is a regression in Linux as 2.6.35 seems to work 100% of the time. 3.2.0 pukes most of the time even on my slow old Pentium M based laptop.

like image 727
Mike Cardwell Avatar asked Jul 20 '12 13:07

Mike Cardwell


3 Answers

As scott points out in his answer, adding a pthread_join(thread, NULL) fixes the symptoms. But why?

Let's put the program in gdb and set up a breakpoint at the point where the open has failed:

(gdb) break test.cc:14
Breakpoint 1 at 0x400c98: file test.cc, line 14.

Then we can observe two different types of behaviour:

  1. (gdb) run […]
    Succeeded
    [New Thread 0x7ffff7fd1700 (LWP 18937)]          // <- child thread
    [Thread 0x7ffff7fd3740 (LWP 18934) exited]       // <- parent thread
    [Switching to Thread 0x7ffff7fd1700 (LWP 18937)]
    Breakpoint 1, open_test () at test.cc:14
    
  2. (gdb) run
    Succeeded
    [New Thread 0x7ffff7fd1700 (LWP 19427)]          // <- child thread
    Succeeded
    [Thread 0x7ffff7fd1700 (LWP 19427) exited]
    [Inferior 1 (process 19424) exited normally]
    

The first one suggests that the parent process exits before the child. As on Linux, processes and threads are pretty much the same, this implies that the PID associated with the main process gets cleaned up. Nothing hinders the child thread from running though. It and his pid are still perfectly valid. Just that /proc/self points to the PID of the main process, which has been deleted at that point.

like image 135
Jonas Schäfer Avatar answered Nov 14 '22 21:11

Jonas Schäfer


This behavior seems to be a kind of bug in the /proc virtual filesystem. If you add this code just before opening the file:

    system("ls -l /proc/net /proc/self/net/tcp");

You'll see that /proc/net is a symbolic link to /proc/self/net, and /proc/sec/net/tcp is properly listed for both calls to open_test, even when the spawned thread call fails.

Edit: I just realized the above test is bogus, since the self would refer to the shell process of the system call, not this process. Using the following function instead also reveals the bug:

void ls_command () {
    ostringstream cmd;
    cmd << "ls -l /proc/net "
        << "/proc/" << getpid()
        << "/net/tcp "
        << "/proc/" << syscall(SYS_gettid)
        << "/net/tcp";
    system(cmd.str().c_str());
}

You'll see that the spawned thread will sometimes not be able to see the parents' /net/tcp file. In fact it has disappeared, since this is the spawned shell's process that is running the ls command.

The workaround below allows the child thread to reliably access what would be its /proc/net/tcp.

My theory is that it is some kind of race condition bug with correctly setting up the /proc/self entry for the thread as the proper blend of parent state and thread specific state. As a test and work around, I modifed the open_test code to use the "process identifier" associated with the thread, rather than trying to access the parent process's (because /proc/self refers to the parent process id, not the thread's).

Edit: As the evidence indicates, the bug has to do with the parent process cleaning up its /proc/self/... state before the child thread has had a chance to read it. I still maintain this to be a bug, since the child thread is still technically part of the process. It's getpid() is still the same before and after the main thread calls pthread_exit(). The /proc entry for the parent process should remain valid until all child threads are completed. Even though

Edit2: Jonas argues this may not be a bug. As evidence of that, there is this from man proc:

       /proc/[pid]/fd
              ...
              In  a  multithreaded process, the contents of this directory are
              not available if the main thread has already  terminated  (typi-
              ally by calling pthread_exit(3)).

But then consider this entry for /proc/self in the same man page entry:

       /proc/self
              This directory refers to the process accessing  the  /proc  file
              system,  and  is  identical  to the /proc directory named by the
              process ID of the same process.

If one is to believe this is not a bug because threads and processes are treated the same in Linux, then threads should have an expectation that /proc/self will work. The bug may easily be fixed by modifying /proc/self to change to use /proc/[gettid] value when the /proc/[getpid] version is no longer available, just as the workaround is doing below.

void * open_test (void *)
{
    ifstream in;
    string file = "/proc/net/tcp";
    in.open(file.c_str());
    if (in.fail()) {
        ostringstream ss;
        ss << "/proc/" << syscall(SYS_gettid) << "/net/tcp";
        cout << "Can't access " << file
             << ", using " << ss.str() << " instead" << endl;
        file = ss.str();
        in.open(file.c_str());
    }
    if (in.fail())
        cout << "Failed - " << strerror(errno) << endl;
    else
        cout << "Succeeded" << endl;
    in.close();

    return 0;
}
like image 29
jxh Avatar answered Nov 14 '22 20:11

jxh


If you add a pthread_join(thread, NULL) call before the pthread_exit() call, your program will work correctly.

like image 37
scott Avatar answered Nov 14 '22 20:11

scott