Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete files while reading directory with readdir()

Tags:

c

My code is something like this:

DIR* pDir = opendir("/path/to/my/dir");
struct dirent pFile = NULL;
while ((pFile = readdir())) {
   // Check if it is a .zip file
   if (subrstr(pFile->d_name,".zip") {
      // It is a .zip file, delete it, and the matching log file
      char zipname[200];
      snprintf(zipname, sizeof(zipname), "/path/to/my/dir/%s", pFile->d_name);
      unlink(zipname);
      char* logname = subsstr(zipname, 0, strlen(pFile->d_name)-4); // Strip of .zip
      logname = appendstring(&logname, ".log"); // Append .log
      unlink(logname);
}
closedir(pDir);

(this code is untested and purely an example)

The point is: Is it allowed to delete a file in a directory while looping through the directory with readdir()? Or will readdir() still find the deleted .log file?

like image 643
To1ne Avatar asked Nov 04 '09 20:11

To1ne


2 Answers

Quote from POSIX readdir:

If a file is removed from or added to the directory after the most recent call to opendir() or rewinddir(), whether a subsequent call to readdir() returns an entry for that file is unspecified.

So, my guess is ... it depends.

It depends on the OS, on the time of day, on the relative order of the files added/deleted, ...

And, as a further point, between the time the readdir() function returns and you try to unlink() the file, some other process could have deleted that file and your unlink() fails.


Edit

I tested with this program:

#include <dirent.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>

int main(void) {
  struct dirent *de;
  DIR *dd;

  /* create files `one.zip` and `one.log` before entering the readdir() loop */
  printf("creating `one.log` and `one.zip`\n");
  system("touch one.log"); /* assume it worked */
  system("touch one.zip"); /* assume it worked */

  dd = opendir("."); /* assume it worked */
  while ((de = readdir(dd)) != NULL) {
    printf("found %s\n", de->d_name);
    if (strstr(de->d_name, ".zip")) {
      char logname[1200];
      size_t i;
      if (*de->d_name == 'o') {
        /* create `two.zip` and `two.log` when the program finds `one.zip` */
        printf("creating `two.zip` and `two.log`\n");
        system("touch two.zip"); /* assume it worked */
        system("touch two.log"); /* assume it worked */
      }
      printf("unlinking %s\n", de->d_name);
      if (unlink(de->d_name)) perror("unlink");
      strcpy(logname, de->d_name);
      i = strlen(logname);
      logname[i-3] = 'l';
      logname[i-2] = 'o';
      logname[i-1] = 'g';
      printf("unlinking %s\n", logname);
      if (unlink(logname)) perror("unlink");
    }
  }
  closedir(dd); /* assume it worked */
  return 0;
}

On my computer, readdir() finds deleted files and does not find files created between opendir() and readdir(). But it may be different on another computer; it may be different on my computer if I compile with different options; it may be different if I upgrade the kernel; ...

like image 90
pmg Avatar answered Nov 05 '22 03:11

pmg


I'm testing my new Linux reference book. The Linux Programming Interface by Michael Kerrisk and it says the following:

SUSv3 explicitly notes that it is unspecified whether readdir() will return a filename that has been added to or removed from since the last since the last call to opendir() or rewinddir(). All filenames that have been neither added nor removed since the last such call are guaranteed to be returned.

I think that what is unspecified is what happens to dirents not yet scanned. Once an entry has been returned, it is 100% guaranteed that it will not be returned anymore whether or not you unlink the current dirent.

Also note the guarantee provided by the second sentence. Since you are leaving alone the other files and only unlinking the current entry for the zip file, SUSv3 guarantees that all the other files will be returned. What happens to the log file is undefined. it may or may not be returned by readdir() but in your case, it shouldn't be harmful.

The reason why I have explored the question it is to find an efficient way to close file descriptors in a child process before exec().

The suggested way in APUE from Stevens is to do the following:

int max_open = sysconf(_SC_OPEN_MAX);
for (int i = 0; i < max_open; ++i)
    close(i);

but I am thinking using code similar to what is found in the OP to scan /dev/fd/ directory to know exactly which fds I need to close. (Special note to myself, skip over dirfd contained in the DIR handle.)

like image 28
lano1106 Avatar answered Nov 05 '22 02:11

lano1106