Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When is a divide by zero not a divide by zero? A puzzle in the debugger (static variable issues)

I'm very confused and I think my debugger is lying to me. I have the following loop in my code:

MyClass::UploadFile(CString strFile)
{
  ...
  static DWORD dwLockWaitTime = EngKey::GetDWORD(DNENG_SERVER_UPLOAD_LOCK_WAIT_TIME, DNENG_SERVER_UPLOAD_LOCK_WAIT_TIME_DEFAULT);
  static DWORD dwLockPollInterval = EngKey::GetDWORD(DNENG_SERVER_UPLOAD_LOCK_POLL_INTERVAL, DNENG_SERVER_UPLOAD_LOCK_POLL_INTERVAL_DEFAULT);

  LONGLONG llReturnedOffset(0LL);
  BOOL bLocked(FALSE);
  for (DWORD sanity = 0; (sanity == 0 || status == RESUMABLE_FILE_LOCKED) && sanity < (dwLockWaitTime / dwLockPollInterval); sanity++) 
    {
      ...

This loop has been executed hundreds of times during the course of my program and the two static variables are not changed anywhere in the code, they're written to just once when they're statically initialized and read from in the loop conditions and in one other place. Since they're user settings which are read from the Windows registry they almost always have the constant values of dwLockWaitTime = 60 and dwLockPollInterval = 5. So the loop is always doing 60 / 5.

Very rarely, I get a crash dump which shows that this line of code has thrown a division by zero error. I've checked what WinDbg says and it shows:

FAULTING_IP: 
procname!CServerAgent::ResumableUpload+54a [serveragent.cpp @ 725]
00000001`3f72d74a f73570151c00    div     eax,dword ptr [proc!dwLockPollInterval (00000001`3f8eecc0)]

EXCEPTION_RECORD:  ffffffffffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress: 000000013f72d74a (proc!CServerAgent::ResumableUpload+0x000000000000054a)
   ExceptionCode: c0000094 (Integer divide-by-zero)
  ExceptionFlags: 00000000
NumberParameters: 0

ERROR_CODE: (NTSTATUS) 0xc0000094 - {EXCEPTION}  Integer division by zero.

I've checked the assembler code and it shows that the crash occurred on this div instruction.

00000001`3f72d744 8b0572151c00    mov     eax,dword ptr [dwLockWaitTime (00000001`3f8eecbc)]
00000001`3f72d74a f73570151c00    div     eax,dword ptr [dwLockPollInterval (00000001`3f8eecc0)]

So as you can see the value at 000000013f8eecbc was moved into eax and then eax was divided by the value at 000000013f8eecc0.

What is at those two values you ask?

0:048> dd 00000001`3f8eecbc
00000001`3f8eecbc  0000003c 00000005 00000001 00000000
00000001`3f8eeccc  00000000 00000002 00000000 00000000
00000001`3f8eecdc  00000000 7fffffff a9ad25cf 7fffffff
00000001`3f8eecec  a9ad25cf 00000000 00000000 00000000
00000001`3f8eecfc  00000000 00000000 00000000 00000000
00000001`3f8eed0c  00000000 00000000 00000000 00000000
00000001`3f8eed1c  00000000 00000000 00000000 00000000
00000001`3f8eed2c  00000000 00000000 00000000 00000000
0:048> dd 000000013f8eecc0
00000001`3f8eecc0  00000005 00000001 00000000 00000000
00000001`3f8eecd0  00000002 00000000 00000000 00000000
00000001`3f8eece0  7fffffff a9ad25cf 7fffffff a9ad25cf
00000001`3f8eecf0  00000000 00000000 00000000 00000000
00000001`3f8eed00  00000000 00000000 00000000 00000000
00000001`3f8eed10  00000000 00000000 00000000 00000000
00000001`3f8eed20  00000000 00000000 00000000 00000000
00000001`3f8eed30  00000000 00000000 00000000 00000000

The constants 60 and 5 exactly as I'd expect. So where's the divide by zero??? Is my debugger lying? Surely the divide by zero has been thrown by the hardware so it can't have made a mistake about that? And if it was a divide by zero in a different place in my code what are the odds that the debugger would show the instruction pointer in exactly this place? I confess, I'm stumped..

like image 810
Benj Avatar asked Jan 22 '15 19:01

Benj


People also ask

What type of error will happen if a program attempts to divide a number by zero?

Any number divided by zero gives the answer “equal to infinity.” Unfortunately, no data structure in the world of programming can store an infinite amount of data. Hence, if any number is divided by zero, we get the arithmetic exception .

What happens when you attempt a division by zero in C?

In languages like C, C++ etc. division by zero invokes undefined behaviour.

Does C++ allow you to divide by 0?

Handling the Divide by Zero Exception in C++Dividing a number by Zero is a mathematical error (not defined) and we can use exception handling to gracefully overcome such operations.


1 Answers

Since the code is part of a member function, and you're calling this function from multiple threads, the static variables are not thread-safe if using a compiler that does not conform to C++ 11 standards. Thus you may get data races when initializing those two static variables.

For a C++ 11 standard conforming compiler, static variables are now guaranteed to be initialized by the first thread, while subsequent threads wait until the static is initialized.

For Visual Studio 2010 and below, static local variables are not guaranteed to be thread safe, since these compilers conform to the C++ 03 and C++ 98 standard.

For Visual Studio 2013, I am not sure of the level of C++ 11 support in terms of static local initialization. Therefore, for Visual Studio 2013, you may have to use proper synchronization to ensure that static local variables are initialized correctly.

For Visual Studio 2015, this item has been addressed and proper static local initialization is fully implemented, so the code you currently have should work correctly for VS 2015 and above.


Edit: For Visual Studio 2013, static local thread-safe initialization is not implemented ("Magic Statics"), as described here.

Therefore, we can cautiously verify that the reason for the original problem is the static-local initialization issue and threading. So the solution (if you want to stick with VS 2013) is to use proper synchronization, or redesign your application so that static variables are no longer needed.

like image 95
PaulMcKenzie Avatar answered Sep 18 '22 05:09

PaulMcKenzie