Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list first level directories only in C?

In a terminal I can call ls -d */. Now I want a c program to do that for me, like this:

#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <unistd.h>

int main( void )
{
    int status;

    char *args[] = { "/bin/ls", "-l", NULL };

    if ( fork() == 0 )
        execv( args[0], args );
    else
        wait( &status ); 

    return 0;
}

This will ls -l everything. However, when I am trying:

char *args[] = { "/bin/ls", "-d", "*/",  NULL };

I will get a runtime error:

ls: */: No such file or directory

like image 246
gsamaras Avatar asked Nov 28 '22 13:11

gsamaras


2 Answers

The lowest-level way to do this is with the same Linux system calls ls uses.

So look at the output of strace -efile,getdents ls:

execve("/bin/ls", ["ls"], [/* 72 vars */]) = 0
...
openat(AT_FDCWD, ".", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
getdents(3, /* 23 entries */, 32768)    = 840
getdents(3, /* 0 entries */, 32768)     = 0
...

getdents is a Linux-specific system call. The man page says that it's used under the hood by libc's readdir(3) POSIX API function.


The lowest-level portable way (portable to POSIX systems), is to use the libc functions to open a directory and read the entries. POSIX doesn't specify the exact system call interface, unlike for non-directory files.

These functions:

DIR *opendir(const char *name);
struct dirent *readdir(DIR *dirp);

can be used like this:

// print all directories, and symlinks to directories, in the CWD.
// like sh -c 'ls -1UF -d */'  (single-column output, no sorting, append a / to dir names)
// tested and works on Linux, with / without working d_type

#define _GNU_SOURCE    // includes _BSD_SOURCE for DT_UNKNOWN etc.
#include <dirent.h>
#include <stdint.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <stdio.h>
#include <stdlib.h>

int main() {
    DIR *dirhandle = opendir(".");     // POSIX doesn't require this to be a plain file descriptor.  Linux uses open(".", O_DIRECTORY); to implement this
    //^Todo: error check
    struct dirent *de;
    while(de = readdir(dirhandle)) { // NULL means end of directory
        _Bool is_dir;
    #ifdef _DIRENT_HAVE_D_TYPE
        if (de->d_type != DT_UNKNOWN && de->d_type != DT_LNK) {
           // don't have to stat if we have d_type info, unless it's a symlink (since we stat, not lstat)
           is_dir = (de->d_type == DT_DIR);
        } else
    #endif
        {  // the only method if d_type isn't available,
           // otherwise this is a fallback for FSes where the kernel leaves it DT_UNKNOWN.
           struct stat stbuf;
           // stat follows symlinks, lstat doesn't.
           stat(de->d_name, &stbuf);              // TODO: error check
           is_dir = S_ISDIR(stbuf.st_mode);
        }

        if (is_dir) {
           printf("%s/\n", de->d_name);
        }
    }
}

There's also a fully compilable example of reading directory entries and printing file info in the Linux stat(3posix) man page. (not the Linux stat(2) man page; it has a different example).


The man page for readdir(3) says the Linux declaration of struct dirent is:

   struct dirent {
       ino_t          d_ino;       /* inode number */
       off_t          d_off;       /* not an offset; see NOTES */
       unsigned short d_reclen;    /* length of this record */
       unsigned char  d_type;      /* type of file; not supported
                                      by all filesystem types */
       char           d_name[256]; /* filename */
   };

d_type is either DT_UNKNOWN, in which case you need to stat to learn anything about whether the directory entry is itself a directory. Or it can be DT_DIR or something else, in which case you can be sure it is or isn't a directory without having to stat it.

Some filesystems, like EXT4 I think, and very recent XFS (with the new metadata version), keep type info in the directory, so it can be returned without having to load the inode from disk. This is a huge speedup for find -name: it doesn't have to stat anything to recurse through subdirs. But for filesystems that don't do this, d_type will always be DT_UNKNOWN, because filling it in would require reading all the inodes (which might not even be loaded from disk).

Sometimes you're just matching on filenames, and don't need type info, so it would be bad if the kernel spent a lot of extra CPU time (or especially I/O time) filling in d_type when it's not cheap. d_type is just a performance shortcut; you always need a fallback (except maybe when writing for an embedded system where you know what FS you're using and that it always fills in d_type, and that you have some way to detect the breakage when someone in the future tries to use this code on another FS type.)

like image 129
Peter Cordes Avatar answered Dec 09 '22 11:12

Peter Cordes


Unfortunately, all solutions based on shell expansion are limited by the maximum command line length. Which varies (run true | xargs --show-limits to find out); on my system, it is about two megabytes. Yes, many will argue that it suffices -- as did Bill Gates on 640 kilobytes, once.

(When running certain parallel simulations on non-shared filesystems, I do occasionally have tens of thousands of files in the same directory, during the collection phase. Yes, I could do that differently, but that happens to be the easiest and most robust way to collect the data. Very few POSIX utilities are actually silly enough to assume "X is sufficient for everybody".)

Fortunately, there are several solutions. One is to use find instead:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d");

You can also format the output as you wish, not depending on locale:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\n'");

If you want to sort the output, use \0 as the separator (since filenames are allowed to contain newlines), and -t= for sort to use \0 as the separator, too. tr will convert them to newlines for you:

system("/usr/bin/find . -mindepth 1 -maxdepth 1 -type d -printf '%p\0' | sort -t= | tr -s '\0' '\n'");

If you want the names in an array, use glob() function instead.

Finally, as I like to harp every now and then, one can use the POSIX nftw() function to implement this internally:

#define _GNU_SOURCE
#include <stdio.h>
#include <ftw.h>

#define NUM_FDS 17

int myfunc(const char *path,
           const struct stat *fileinfo,
           int typeflag,
           struct FTW *ftwinfo)
{
    const char *file = path + ftwinfo->base;
    const int depth = ftwinfo->level;

    /* We are only interested in first-level directories.
       Note that depth==0 is the directory itself specified as a parameter.
    */
    if (depth != 1 || (typeflag != FTW_D && typeflag != FTW_DNR))
        return 0;

    /* Don't list names starting with a . */
    if (file[0] != '.')
        printf("%s/\n", path);

    /* Do not recurse. */
    return FTW_SKIP_SUBTREE;
}

and the nftw() call to use the above is obviously something like

if (nftw(".", myfunc, NUM_FDS, FTW_ACTIONRETVAL)) {
    /* An error occurred. */
}

The only "issue" in using nftw() is to choose a good number of file descriptors the function may use (NUM_FDS). POSIX says a process must always be able to have at least 20 open file descriptors. If we subtract the standard ones (input, output, and error), that leaves 17. The above is unlikely to use more than 3, though.

You can find the actual limit using sysconf(_SC_OPEN_MAX), and subtracting the number of descriptors your process may use at the same time. In current Linux systems, it is typically limited to 1024 per process.

The good thing is, as long as that number is at least 4 or 5 or so, it only affects the performance: it just determines how deep nftw() can go in the directory tree structure, before it has to use workarounds.

If you want to create a test directory with lots of subdirectories, use something like the following Bash:

mkdir lots-of-subdirs
cd lots-of-subdirs
for ((i=0; i<100000; i++)); do mkdir directory-$i-has-a-long-name-since-command-line-length-is-limited ; done

On my system, running

ls -d */

in that directory yields bash: /bin/ls: Argument list too long error, while the find command and the nftw() based program all run just fine.

You also cannot remove the directories using rmdir directory-*/ for the same reason. Use

find . -name 'directory-*' -type d -print0 | xargs -r0 rmdir

instead. Or just remove the entire directory and subdirectories,

cd ..
rm -rf lots-of-subdirs
like image 43
Nominal Animal Avatar answered Dec 09 '22 10:12

Nominal Animal