Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Not checking close()'s return value: how serious, really?

Tags:

c

linux

posix

bsd

Linux's "man close" warns (SVr4, 4.3BSD, POSIX.1-2001):

Not checking the return value of close() is a common but nevertheless serious programming error. It is quite possible that errors on a previous write(2) operation are first reported at the final close(). Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and with disk quota.

I can believe that this error is common (at least in applications; I'm no kernel hacker). But how serious is it, today or at any point in the past three decades? In particular:

Is there a simple, reproducible example of such silent loss of data? Even a contrived one like sending SIGKILL during close()?

If such an example exists, can the data loss be handled more gracefully than just

printf("Sorry, dude, you lost some data.\n"); ?

like image 372
Camille Goudeseune Avatar asked Sep 27 '13 17:09

Camille Goudeseune


1 Answers

Calling POSIX's close() may lead to errno being set to:

  1. EBADF: Bad file number
  2. EINTR: Interrupted system call
  3. EIO: I/O error (from POSIX Specification Issue 6 on)

Different errors indicate different issues:

  1. EBADF indicates a programming error, as the program should have kept track of which file/socket descriptors are still open. I'd consider testing for this error a quality management action.

  2. EINTR seems to be the most difficult to handle as it is not clear whether the file/socket descriptor passed is valid after the function returned or not (under Linux it propably is not: http://lkml.org/lkml/2002/7/17/165). Observing this error you should perhaps check the program's way of handling signals.

  3. EIO is expected to appear under special conditons only, as mentioned in the man-pages. However at least just because of this one should track this error, as if it occurs most likely there something went really wrong.

All in all each of these errors has at least one good reason of being caught, so just do it! ;-)

Possible specific reactions:

  1. In terms of stability ignoring an EBADF might be acceptable, however the error shall not happen. As stated fix your code as the program does not seem to really know what it is doing.

  2. Observing an EINTR could indicate signals are running wild. This is not nice. Definitly go for the root cause. As it is unclear whether descriptors got closed or not go for a system restart asap.

  3. Running into an EIO definitly could inicate a serious failure in the hardware*1 involved. However, before the strongly recommended shutdown of the system it might be worth to simply retry the operation, although the same concerns apply as for the EINTR that it is uncertain whether the descriptor really got closed or not. In case it did got closed it is a bad idea to close it again, as it might already be in use by another thread. Go for shutdown and hardware*1 replacement asap.


*1 Hardware it to be seen in a broder sense here: An NFS server acts as a disk, so the EIO could simply due to a misconfigured server or network or whatever is involved in the NFS connection.

like image 66
alk Avatar answered Sep 20 '22 21:09

alk