"Stale file handle" error, when process trying read the file, that other process already had deleted

Tags:

I'm writing stress test suite for testing distributed file systems over NFS.

In some cases when some process deletes file, while some other process attempts to read from it, I'm getting "Stale file handle" error (116).

Is that kind of error is expected and acceptable in such race condition?

Test working as follows:

Starting x number of client machines
Each client machine runs y processes
Each process can do any file operation as stat/read/delete/open
Mentioned file ops are standard python methods - os.stat/read/os.remove/open
All files are empty 0 bytes data

File is exists, as successful stat operation shows:

controller_debug.log.2:2016-10-26 15:02:30,156;INFO - [LG-E27A-LNX:0xa]: finished 640522b4d94c453ea545cb86568320ca, result: success | stat | /JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41 | data: {} | 2016/10/26 15:02:30.156

Process 0x1 on client CLIENT-A completed successful delete:

controller_debug.log.2:2016-10-26 15:02:30,164;INFO - [CLIENT-A:0x1]: finished 5f5dfe6a06de495f851745a78857eec1, result: success | delete | /JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41 | data: {} | 2016/10/26 15:02:30.161

3 milliseconds later, process 0xb on client CLIENT-B failed "read" op due to "Stale file handle"

controller_debug.log.2:2016-10-26 15:02:30,164;INFO - [CLIENT-B:0xb]: finished e84e2064ead042099310af1bd44821c0, result: failed | read | /mnt/DIRSPLIT-node0.b27-1/JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41 | [errno:116] | Stale file handle | 142 | data: {} | 2016/10/26 15:02:30.160 controller_debug.log.2:2016-10-26 15:02:30,164;ERROR - Operation read FAILED UNEXPECTEDLY on File JUyw481MfvsBHOm1KQu7sHRB6ffAXKjwIATlsXmOgWh8XKQaIrPbxLgAo7sucdAM/o6V266xE8bTaUGzk8YDMfDAJp0YIfbT4fIK1oZ2R20tRX3xFCvjISj7WuMEwEV41 due to Stale file handle

Thanks

240

asked Oct 26 '16 12:10

Samuel

1 Answers

This is totally expected. The NFS specification is clear about use of file handles after an object (be it file or directory) has been deleted. Section 4 clearly addresses this. For example:

The persistent filehandle will become stale or invalid when the file system object is removed. When the server is presented with a persistent filehandle that refers to a deleted object, it MUST return an error of NFS4ERR_STALE.

This is such a common problem, it even has its own entry in section A.10 of the NFS FAQ, which says one common cause of ESTALE errors is that:

The file handle refers to a deleted file. After a file is deleted on the server, clients don't find out until they try to access the file with a file handle they had cached from a previous LOOKUP. Using rsync or mv to replace a file while it is in use on another client is a common scenario that results in an ESTALE error.

The expected resolution is that your client app must close and reopen the file to see what has happened. Or, as the FAQ says:

... to recover from an ESTALE error, an application must close the file or directory where the error occurred, and reopen it so the NFS client can resolve the pathname again and retrieve the new file handle.

100

answered Oct 11 '22 20:10

Peter Brittain

Related questions
                            
                                Pymongo - tailing oplog [duplicate]
                            
                                using Kivy Garden Graph in KV language
                            
                                How do I stop PIL from swapping height/width when rotating an image 90°?
                            
                                AWS Elastic Beanstalk CLI does not prompt to create new keypair
                            
                                Apache Spark: How to create a matrix from a DataFrame?
                            
                                Difference between PyMongo and Flask-PyMongo libraries
                            
                                AttributeError: 'ManyToManyField' object has no attribute '_m2m_reverse_name_cache'
                            
                                wxPython threads blocking
                            
                                Preferred block size when reading/writing big binary files
                            
                                Denormalization of predicted data in neural networks
                            
                                Python/Django: Why does importing a module right before using it prevent a circular import?
                            
                                How to distribute type hints to PyPi?
                            
                                Python SSL requests and Let's Encrypt certs
                            
                                Python: ensure os.environ and sys.path are equal: web-requests, shell, cron, celery
                            
                                Save Matplotlib plot image into Django model
                            
                                How do I retrieve output from Multiprocessing in Python?
                            
                                Cannot import name 'spawn' for pexpect while using pxssh
                            
                                Django | joined path is located outside of the base path component {% static img.thumbnail.url %}, Error 400 with whitenoise
                            
                                Selenium not freeing up memory even after calling close/quit
                            
                                Is Spark's KMeans unable to handle bigdata?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

"Stale file handle" error, when process trying read the file, that other process already had deleted

Tags:

python

linux

nfs

Samuel

People also ask

1 Answers

Peter Brittain

Recent Activity

Donate For Us