I have a directory change monitor process that reads updates from files within a set of directories. I have another process that performs small writes to a lot of files to those directories (test program). Figure about 100 directories with 10 files in each, and about 500 files being modified per second.
After running for a while, the directory monitor process hangs on a call to fclose()
in a method that is basically tailing the file. In this method, I fopen()
the file, check that the handle is valid, do a few seeks and reads, and then call fclose()
. These reads are all performed by the same thread in the process. After the hang, the thread never progresses.
I couldn't find any good information on why fclose()
might deadlock instead of returning some kind of error code. The documentation does mention _fclose_nolock()
, but it doesn't seem to be available to me (Visual Studio 2003).
The hang occurs for both debug and release builds. In a debug build, I can see that fclose()
calls _free_base()
, which hangs before returning. Some kind of call into kernel32.dll => ntdll.dll => KernelBase.dll => ntdll.dll is spinning. Here's the assembly from ntdll.dll that loops indefinitely:
77CEB83F cmp dword ptr [edi+4Ch],0
77CEB843 lea esi,[ebx-8]
77CEB846 je 77CEB85E
77CEB848 mov eax,dword ptr [edi+50h]
77CEB84B xor dword ptr [esi],eax
77CEB84D mov al,byte ptr [esi+2]
77CEB850 xor al,byte ptr [esi+1]
77CEB853 xor al,byte ptr [esi]
77CEB855 cmp byte ptr [esi+3],al
77CEB858 jne 77D19A0B
77CEB85E mov eax,200h
77CEB863 cmp word ptr [esi],ax
77CEB866 ja 77CEB815
77CEB868 cmp dword ptr [edi+4Ch],0
77CEB86C je 77CEB87E
77CEB86E mov al,byte ptr [esi+2]
77CEB871 xor al,byte ptr [esi+1]
77CEB874 xor al,byte ptr [esi]
77CEB876 mov byte ptr [esi+3],al
77CEB879 mov eax,dword ptr [edi+50h]
77CEB87C xor dword ptr [esi],eax
77CEB87E mov ebx,dword ptr [ebx+4]
77CEB881 lea eax,[edi+0C4h]
77CEB887 cmp ebx,eax
77CEB889 jne 77CEB83F
Any ideas what might be happening here?
In the C Programming Language, the fclose function closes a stream pointed to by stream. The fclose function flushes any unwritten data in the stream's buffer.
Notes: The storage pointed to by the FILE pointer is freed by the fclose() function.
Calling fclose will ensure the file descriptor is properly disposed of and output buffers flushed so the data written to the file will be present in the file on disk. fclose() closes the associated file.
I posted this as a comment, but I realize this could be an answer in its own right...
Based on the disassembly, my guess is you've overwritten some internal heap structure maintained by ntdll
, and it is looping forever iterating through a linked list.
In particular at the start of the loop, the current list node seems to be in ebx
. At the end of the loop, the expected last node (or terminator, if you like -- it looks a bit like these are circular lists and the last node is the same as the first, pointer to this node being at [edi+4Ch]
) is contained in eax
. Probably the result of cmp ebx, eax
is never equal, because there is some cycle in the list introduced by a heap corruption.
I don't think this has anything to do with locks, otherwise we would see some atomic instructions (eg. lock cmpxchg
, xchg
, etc.) or calls to other synchronization functions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With