Edit: See the end of the question for an update on the answer.
I have spent several weeks tracking down a very odd bug in a piece of software I maintain. Long story short, there is an old piece of software that is in distribution, and a new piece of software that needs to match the output of the old. The two rely (in theory) on a common library.[1] However, I cannot duplicate the results being generated by the original version of the library, even though the source for the two versions of the library matches. The actual code in question is very simple. The original version looked like this (the "voodoo" commented isn't mine):[2]
// float rstr[101] declared and initialized elsewhere as a global
void my_function() {
// I have elided several declarations not used until later in the function
double tt, p1, p2, t2;
char *ptr;
ptr = NULL;
p2 = 0.0;
t2 = 0.0; /* voooooodoooooooooo */
tt = (double) rstr[20];
p1 = (double) rstr[8];
// The code goes on and does lots of other things ...
}
The last statement I have included is where different behavior crops up. In the
original program, rstr[8]
has the value 101325.
, and after casting it to
double
[3] and assigning it, p1
has the value 101324.65625
. Similarly, tt
ends up with the value 373.149999999996
. I have confirmed these values with
both debug prints and examining the values in the debugger (including checking
the hex values). This is not surprising in any sense, it is as expected with
floating point values.
In a test wrapper around the same version of the library (as well as in any call
to a refactored version of the library), the first assignment (to tt
)
produces the same results. However, p1
ends up as 101325.0
, matching the original
value in rstr[8]
. This difference, while small, sometimes produces substantial
variations in calculations that depend on the value of p1
.
My test wrapper was simple, and matched the inclusion pattern of the original exactly, but eliminated all other context:
#include "the_header.h"
float rstr[101];
int main() {
rstr[8] = 101325.;
rstr[20] = 373.15;
my_function();
}
Out of desperation, I have even gone to the trouble of looking at the disassembly generated by VC6.
4550: tt = (double) rstr[20];
0042973F fld dword ptr [rstr+50h (006390a8)]
00429745 fstp qword ptr [ebp-0Ch]
4551: p1 = (double) rstr[8];
00429748 fld dword ptr [rstr+20h (00639078)]
0042974E fstp qword ptr [ebp-14h]
The version generated by VC6 for the same library function when called by the test code wrapper (which matches the version generated by VC6 for my refactored version of the library):
60: tt = (double) rstr[20];
00408BC8 fld dword ptr [_rstr+50h (0045bc88)]
00408BCE fstp qword ptr [ebp-0Ch]
61: p1 = (double) rstr[8];
00408BD1 fld dword ptr [_rstr+20h (0045bc58)]
00408BD7 fstp qword ptr [ebp-14h]
The only difference I can see, besides where in memory the array is stored and
how far along through the program this is occuring, is the leading _
on the
reference to rstr
in the second. In general, VC6 uses a leading underscore for
name-mangling with functions, but I cannot find any documentation of it doing
name-mangling with array pointers. Nor can I see why these would produce
different results in any case, unless that name-mangling is involved with
reading the data accessed from the pointers in a different way.
The only other difference I can identify between the two (apart from calling context) is that the original is an MFC-based Win32 application, while the latter is a non-MFC console application. The two are otherwise configured the same way, and they are built with identical compilation flags and against the same C runtime.
Any suggestions would be much appreciated.
Edit: the solution, as several answers very helpfully pointed out, was to examine the binary/hex values and compare them to make sure the things I thought were exactly the same in fact were the same. This proved not to be the case—my strong protestations to the contrary notwithstanding.
Here I get to eat some humble pie and admit that while I thought I had checked those values, I had in fact checked some other, closely related values—a point I discovered only when I went back to look at the data again. As it turned out, the values being set in rstr[8]
were very slightly different, and so the conversion to double highlighted the very slight differences, and these differences then propagated throughout the program in just the way I noted.
The discrepancy with the initialization I can explain based on the way the two programs work. Specifically, in one case rstr[8]
is specified based on a user input to a GUI (and is in this case also the product of a conversion calculation), whereas in another, it is read in from a file where it has been stored with some loss of precision. Interestingly, in neither case was it actually exactly 101325.0
, even the case in which it was read from a file where it had been stored as 1.01325e5
.
This will teach me to double check my double checking of these sorts of things. Many thanks to Eric Postpischil and unwind for prompting me to check it again and for the prompt feedback. It was very helpful.
#include
and the
functions referenced via extern
statements. I have fixed this in a
refactored version of the library that is actually a library, but see the
rest of the question./* voooooodoooooooooo */
comment because it illustrates the…
unusual… programming practices of my predecessor. I think that element is
present because this was originally translated from Fortran and the developer
had used it as a means of dealing with some sort of memory bug. The line has
no effect whatsoever on the actual behavior of the code.This:
In the original program,
rstr[8]
has the value 101325., and after casting it todouble[3]
and assigning it, p1 has the value101324.65625
implies that the float
value is not, in fact, exactly 101325.0, so when you convert to double
you see more of the precision. I would (highly) suspect the method by which you inspect the float
value, automatic (implicit and silent) rounding when printing is very common with floats. Inspect the bit pattern and decode it using the known format of the float on your system, to make sure you're not being tricked.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With