I am writing a program that finds all sub-directories from a parent directory which contains a huge number of files using os.File.Readdir
, but running an strace
to see the count of systemcalls showed that the go version is using an lstat()
on all the files/directories present in the parent directory. (I am testing this with /usr/bin
directory for now)
Go code:
package main
import (
"fmt"
"os"
)
func main() {
x, err := os.Open("/usr/bin")
if err != nil {
panic(err)
}
y, err := x.Readdir(0)
if err != nil {
panic(err)
}
for _, i := range y {
fmt.Println(i)
}
}
Strace on the program (without following threads):
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.62 0.004110 2 2466 write
3.46 0.000152 7 22 getdents64
2.92 0.000128 0 2466 lstat // this increases with increase in no. of files.
0.00 0.000000 0 11 mmap
0.00 0.000000 0 1 munmap
0.00 0.000000 0 114 rt_sigaction
0.00 0.000000 0 8 rt_sigprocmask
0.00 0.000000 0 1 sched_yield
0.00 0.000000 0 3 clone
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 sigaltstack
0.00 0.000000 0 1 arch_prctl
0.00 0.000000 0 1 gettid
0.00 0.000000 0 57 futex
0.00 0.000000 0 1 sched_getaffinity
0.00 0.000000 0 1 openat
------ ----------- ----------- --------- --------- ----------------
100.00 0.004390 5156 total
I tested the same with the C's readdir()
without seeing this behaviour.
C code:
#include <stdio.h>
#include <dirent.h>
int main (void) {
DIR* dir_p;
struct dirent* dir_ent;
dir_p = opendir ("/usr/bin");
if (dir_p != NULL) {
// The readdir() function returns a pointer to a dirent structure representing the next
// directory entry in the directory stream pointed to by dirp.
// It returns NULL on reaching the end of the directory stream or if an error occurred.
while ((dir_ent = readdir (dir_p)) != NULL) {
// printf("%s", dir_ent->d_name);
// printf("%d", dir_ent->d_type);
if (dir_ent->d_type == DT_DIR) {
printf("%s is a directory", dir_ent->d_name);
} else {
printf("%s is not a directory", dir_ent->d_name);
}
printf("\n");
}
(void) closedir(dir_p);
}
else
perror ("Couldn't open the directory");
return 0;
}
Strace on the program:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000128 0 2468 write
0.00 0.000000 0 1 read
0.00 0.000000 0 3 open
0.00 0.000000 0 3 close
0.00 0.000000 0 4 fstat
0.00 0.000000 0 8 mmap
0.00 0.000000 0 3 mprotect
0.00 0.000000 0 1 munmap
0.00 0.000000 0 3 brk
0.00 0.000000 0 3 3 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 4 getdents
0.00 0.000000 0 1 arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00 0.000128 2503 3 total
I am aware that the only fields in the dirent structure that are mandated by POSIX.1 are d_name and d_ino, but I am writing this for a specific filesystem.
Tried *File.Readdirnames()
, which doesn't use an lstat
and gives a list of all files and directories, but to see if the returned string is a file or a directory will eventually do an lstat
again.
lstat()
on all the files un-necessarily. I could see the C program is using the following syscalls. open("/usr/bin", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=69632, ...}) = 0
brk(NULL) = 0x1098000
brk(0x10c1000) = 0x10c1000
getdents(3, /* 986 entries */, 32768) = 32752
C
and GO
version, which will be hitting the disk.The package dirent
looks like it accomplishes what you are looking for. Below is your C example written in Go:
package main
import (
"bytes"
"fmt"
"io"
"github.com/EricLagergren/go-gnulib/dirent"
"golang.org/x/sys/unix"
)
func int8ToString(s []int8) string {
var buff bytes.Buffer
for _, chr := range s {
if chr == 0x00 {
break
}
buff.WriteByte(byte(chr))
}
return buff.String()
}
func main() {
stream, err := dirent.Open("/usr/bin")
if err != nil {
panic(err)
}
defer stream.Close()
for {
entry, err := stream.Read()
if err != nil {
if err == io.EOF {
break
}
panic(err)
}
name := int8ToString(entry.Name[:])
if entry.Type == unix.DT_DIR {
fmt.Printf("%s is a directory\n", name)
} else {
fmt.Printf("%s is not a directory\n", name)
}
}
}
Starting with Go 1.16 (Feb 2021), a good option is os.ReadDir
:
package main
import "os"
func main() {
files, e := os.ReadDir(".")
if e != nil {
panic(e)
}
for _, file := range files {
println(file.Name())
}
}
os.ReadDir
returns fs.DirEntry
instead of fs.FileInfo
, which means that
Size
and ModTime
methods are omitted, making the process more efficient.
https://golang.org/pkg/os#ReadDir
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With