Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a good way to detect a stale NFS mount

Tags:

I have a procedure I want to initiate only if several tests complete successfully.

One test I need is that all of my NFS mounts are alive and well.

Can I do better than the brute force approach:


mount | sed -n "s/^.* on \(.*\) type nfs .*$/\1/p" |  while read mount_point ; do    timeout 10 ls $mount_point >& /dev/null || echo "stale $mount_point" ;  done 

Here timeout is a utility that will run the command in the background, and will kill it after a given time, if no SIGCHLD was caught prior to the time limit, returning success/fail in the obvious way.


In English: Parse the output of mount, check (bounded by a timeout) every NFS mount point. Optionally (not in the code above) breaking on the first stale mount.

like image 705
Chen Levy Avatar asked Oct 29 '09 12:10

Chen Levy


People also ask

How check stale NFS mount?

Here timeout is a utility that will run the command in the background, and will kill it after a given time, if no SIGCHLD was caught prior to the time limit, returning success/fail in the obvious way. In English: Parse the output of mount , check (bounded by a timeout) every NFS mount point.

How fix stale NFS mount?

Try restarting NFS first on the server and then on the clients. This may clear the file handles. Rebooting NFS servers with files opened from other servers is not recommended. This is especially problematic if the open file has been deleted on the server.

What causes stale NFS mounts?

The answer is any change in the mounted file's underlying inode, disk device, or inode generation on the NFS server causes an NFS stale filehandle.

How do I fix stale file handle in Linux?

Stale file handles are refreshed when the process reopens the file. Doing so updates the file description with the file's new inode number if it exists. In most cases, the process must do this internally. Otherwise, we may have to restart it.


1 Answers

A colleague of mine ran into your script. This doesn't avoid a "brute force" approach, but if I may in Bash:

while read _ _ mount _; do    read -t1 < <(stat -t "$mount") || echo "$mount timeout";  done < <(mount -t nfs) 

mount can list NFS mounts directly. read -t (a shell builtin) can time out a command. stat -t (terse output) still hangs like an ls*. ls yields unnecessary output, risks false positives on huge/slow directory listings, and requires permissions to access - which would also trigger a false positive if it doesn't have them.

while read _ _ mount _; do    read -t1 < <(stat -t "$mount") || lsof -b 2>/dev/null|grep "$mount";  done < <(mount -t nfs) 

We're using it with lsof -b (non-blocking, so it won't hang too) in order to determine the source of the hangs.

Thanks for the pointer!

  • test -d (a shell builtin) would work instead of stat (a standard external) as well, but read -t returns success only if it doesn't time out and reads a line of input. Since test -d doesn't use stdout, a (( $? > 128 )) errorlevel check on it would be necessary - not worth the legibility hit, IMO.
like image 85
astrostl Avatar answered Sep 21 '22 12:09

astrostl