Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Machine retains file exists/locks on client-side power outage

Our app running on client server A and creates a file on the server 2008 R2 file-server using:

CreateFile(LockFileName,
                  GENERIC_READ or GENERIC_WRITE,
                  FILE_SHARE_READ, nil,
                  CREATE_ALWAYS,
                  FILE_FLAG_WRITE_THROUGH or FILE_FLAG_DELETE_ON_CLOSE,
                  0);

The client is testing a disaster situation and powering off 'server A' and leaving it off. They're reporting that our app running on 'server B' using the same filename and the same code fragment above fails (ie the file continues to exist) for at least 15 minutes until, we believe, they browse to folder containing the file in Windows Explorer at which point the file is deleted automatically.

Is anyone aware of how this is supposed to behave in this situation, where the creating server has gone away, should the handles be released and the file removed automatically? And why does looking at the file cause it to delete?

Interestingly, on another supposedly similar setup the issue does not occur.

like image 813
Sam Cogan Avatar asked Feb 03 '12 17:02

Sam Cogan


1 Answers

[...] where the creating server has gone away, should the handles be released and the file removed automatically?

Eventually yes, but not immediately. As you are running Windows Server 2008 R2 (and thus SMBv2, note that I assume that both server and client are running on Windows Server 2008 R2) the client will request a durable file handle. According to the SMBv2 specification, section 3.3.6.2 and 3.3.7.1 the server must start the durable open scavenger timer (set to 16 minutes on Windows Server by default). Once the timer expires the server must examine all open handles and close those that have not been reclaimed by a client.

In your scenario of course, an open question is whether the server detects the connection loss to the client at all, as the client (i.e. the whole server, not just the process) according to your description is killed immediately.

Now assume that another client is trying to open the file while the durable timeout is still running/the server still considers the file to be open by the first client. Then it is supposed to send an oplock break notification (section 2.2.23.1) to the client that initially opened the file. As the client is unable to respond (it has been killed) the server will wait for the oplock break acknowledgment timeout to expire (section 3.3.2.1, 35 seconds by default) before it will grant the new client access to the file.

There is one other thing to note: The behavior will be different if the second client accesses the file via a local path rather than via an UNC path. In this case the client won't have to wait for the oplock break ack timeout to occur. Windows will grant him access to the file immediately while it will try to send a close request to the first client.

This is how the system is supposed to behave. As to why you are experiencing the issues described I cannot tell. I wouldn't be surprised if you'd stumbled upon a bug in the Fileserver implementation of Win Server 2008. I would try to troubleshoot the issue using the tools mentioned in the other answers (procmon is really nice) and Wireshark helps a lot too.

like image 115
afrischke Avatar answered Sep 22 '22 22:09

afrischke