Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

lseek/write suddenly returns -1 with errno = 9 (Bad file descriptor)

My application uses lseek() to seek the desired position to write data. The file is successfully opened using open() and my application was able to use lseek() and write() lots of times.

At a given time, for some users and not easily reproducable, lseek() returns -1 with an errno of 9. File is not closed before this and the filehandle (int) isn't reset.

After this, another file is created; open() is okay again and lseek() and write() works again.

To make it even worse, this user tried the complete sequence again and all was well.

So my question is, can the OS close the file handle for me for some reason? What could cause this? A file indexer or file scanner of some sort?

What is the best way to solve this; is this pseudo code the best solution? (never mind the code layout, will create functions for it)

int fd=open(...);
if (fd>-1) {
  long result = lseek(fd,....);
  if (result == -1 && errno==9) {
      close(fd..); //make sure we try to close nicely
      fd=open(...);

      result = lseek(fd,....);
  }
}

Anybody experience with something similar?

Summary: file seek and write works okay for a given fd and suddenly gives back errno=9 without a reason.

like image 460
Ger Teunis Avatar asked Mar 30 '10 12:03

Ger Teunis


2 Answers

So my question is, can the OS close the file handle for me for some reason? What could cause > this? A file indexer or file scanner of some sort?

No, this will not happen.

What is the best way to solve this; is this pseudo code the best solution? (never mind the code layout, will create functions for it)

No, the best way is to find the bug and fix it.

Anybody experience with something similar?

I've seen fds getting messed up many times, resulting in EBADF in the some of the cases, and blowing up spectacularly in others, it's been:

  • buffer overflows - overflowing something and writing a nonsense value into a 'int fd;' variable.
  • silly bugs that happen because some corner case someone did if(fd = foo[i].fd) when they meant if(fd == foo[i].fd)
  • Raceconditions between threads, some thread closes the wrong file descriptor that some other thread wants to use.

If you can find a way to reproduce this problem, run your program under 'strace', so you can see whats going on.

like image 95
nos Avatar answered Sep 22 '22 11:09

nos


The OS shall not close file handles randomly (I am assuming a Unix-like system). If your file handle is closed, then there is something wrong with your code, most probably elsewhere (thanks to the C language and the Unix API, this can be really anywhere in the code, and may be due to, e.g., a slight buffer overflow in some piece of code which really looks like to be unrelated).

Your pseudo-code is the worst solution, since it will give you the impression of having fixed the problem, while the bug still lurks.

I suggest that you add debug prints (i.e. printf() calls) wherever you open and close a file or socket. Also, try Valgrind.

(I just had yesterday a spooky off-by-1 buffer overflow, which damaged the least significant byte of a temporary slot generated by the compiler to save a CPU register; the indirect effect was that a structure in another function appeared to be shifted by a few bytes. It took me quite some time to understand what was going on, including some thorough reading of Mips assembly code).

like image 44
Thomas Pornin Avatar answered Sep 24 '22 11:09

Thomas Pornin