I'm debugging a system load problem that a customer encounters on their production system and they've made a test application that simulates the load to reproduce the problem:
In this particular workload, one of the things the coder did was to:
while(1)
initialize inotify
watch a directory for events
receive event
process event
remove watch
close inotify fd
Strangely enough, the high system load comes from the close()
of the inotify fd:
inotify_init() = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.059537>
write(1, "Child [032] sleeping\n", 21) = 21 <0.000012>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.b8tlfT\0\0", 512) = 32 <0.231012>
inotify_rm_watch(4, 1) = 0 <0.000044>
close(4) = 0 <0.702530>
open("/mnt/tmp/msys_sim/QUEUES/Child_032", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4 <0.000031>
lseek(4, 0, SEEK_SET) = 0 <0.000010>
getdents(4, /* 3 entries */, 32768) = 88 <0.000048>
getdents(4, /* 0 entries */, 32768) = 0 <0.000009>
write(1, "Child [032] dequeue [SrcFile.b8t"..., 37) = 37 <0.000011>
unlink("/mnt/tmp/msys_sim/QUEUES/Child_032/SrcFile.b8tlfT") = 0 <0.059298>
lseek(4, 0, SEEK_SET) = 0 <0.000011>
getdents(4, /* 2 entries */, 32768) = 48 <0.000038>
getdents(4, /* 0 entries */, 32768) = 0 <0.000009>
close(4) = 0 <0.000012>
inotify_init() = 4 <0.000020>
inotify_add_watch(4, "/mnt/tmp/msys_sim/QUEUES/Child_032", IN_CREATE) = 1 <0.040385>
write(1, "Child [032] sleeping\n", 21) = 21 <0.000903>
read(4, "\1\0\0\0\0\1\0\0\0\0\0\0\20\0\0\0SrcFile.mQgUSh\0\0", 512) = 32 <0.023423>
inotify_rm_watch(4, 1) = 0 <0.000012>
close(4) = 0 <0.528736>
What could possibly be causing the close() call to take such an enormous amount of time? I can identify two possible things:
/mnt/tmp/msys_sim/SOURCES
and a particular file in /mnt/tmp/msys_sim/QUEUES/Child_032
is hardlinked to one in that directory. But SOURCES is never opened by the above processIs it an artifact of using inotify wrong? What can I point at to say "What you're doing is WRONG!"?
Output of perf top
(I had been looking for this!)
Events: 109K cycles
70.01% [kernel] [k] _spin_lock
24.30% [kernel] [k] __fsnotify_update_child_dentry_flags
2.24% [kernel] [k] _spin_unlock_irqrestore
0.64% [kernel] [k] __do_softirq
0.60% [kernel] [k] __rcu_process_callbacks
0.46% [kernel] [k] run_timer_softirq
0.40% [kernel] [k] rcu_process_gp_end
Sweet! I suspect a spinlock somewhere and the entire system goes highly latent when this happens.
Usually the pseudo code inotify
loop would look like this:
initialize inotify
watch a directory | file for events
while(receive event) {
process event
}
[ remove watch ]
close inotify fd
There is no need to remove the watch and reinitialize inotify on every loop.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With