I have a procedure I want to initiate only if several tests complete successfully.
One test I need is that all of my NFS mounts are alive and well.
Can I do better than the brute force approach:
mount | sed -n "s/^.* on \(.*\) type nfs .*$/\1/p" | while read mount_point ; do timeout 10 ls $mount_point >& /dev/null || echo "stale $mount_point" ; done
Here timeout
is a utility that will run the command in the background, and will kill it after a given time, if no SIGCHLD
was caught prior to the time limit, returning success/fail in the obvious way.
In English: Parse the output of mount
, check (bounded by a timeout) every NFS mount point. Optionally (not in the code above) breaking on the first stale mount.
Here timeout is a utility that will run the command in the background, and will kill it after a given time, if no SIGCHLD was caught prior to the time limit, returning success/fail in the obvious way. In English: Parse the output of mount , check (bounded by a timeout) every NFS mount point.
Try restarting NFS first on the server and then on the clients. This may clear the file handles. Rebooting NFS servers with files opened from other servers is not recommended. This is especially problematic if the open file has been deleted on the server.
The answer is any change in the mounted file's underlying inode, disk device, or inode generation on the NFS server causes an NFS stale filehandle.
Stale file handles are refreshed when the process reopens the file. Doing so updates the file description with the file's new inode number if it exists. In most cases, the process must do this internally. Otherwise, we may have to restart it.
A colleague of mine ran into your script. This doesn't avoid a "brute force" approach, but if I may in Bash:
while read _ _ mount _; do read -t1 < <(stat -t "$mount") || echo "$mount timeout"; done < <(mount -t nfs)
mount
can list NFS mounts directly. read -t
(a shell builtin) can time out a command. stat -t
(terse output) still hangs like an ls
*. ls
yields unnecessary output, risks false positives on huge/slow directory listings, and requires permissions to access - which would also trigger a false positive if it doesn't have them.
while read _ _ mount _; do read -t1 < <(stat -t "$mount") || lsof -b 2>/dev/null|grep "$mount"; done < <(mount -t nfs)
We're using it with lsof -b
(non-blocking, so it won't hang too) in order to determine the source of the hangs.
Thanks for the pointer!
test -d
(a shell builtin) would work instead of stat
(a standard external) as well, but read -t
returns success only if it doesn't time out and reads a line of input. Since test -d
doesn't use stdout, a (( $? > 128 ))
errorlevel check on it would be necessary - not worth the legibility hit, IMO.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With