Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does C# implicitly cast terms of integral types to terms of double?

Tags:

c#

Are there any C# specifications that state how implicit conversions of terms of integral types (e.g., int) to terms of double are supposed to work? If so, can someone tell me the algorithm or direct me to it?

C# 6.0 draft specification states “the value of a real literal of type float or double is determined by using the IEEE ‘round to nearest’ mode” under Lexical structure -> Grammars -> Lexical grammar -> Lexical analysis -> Tokens -> Literals -> Real literals; however I wasn’t able to find anything about how implicit conversions work.

The only thing I found under Conversions -> Implicit conversions -> Implicit numeric conversions in the same specification was “conversions from int, uint, long, or ulong to float and from long or ulong to double may cause a loss of precision, but will never cause a loss of magnitude.”

I do know that implicit conversions don’t follow the same algorithm that real literals do as the below program illustrates*:

using System;
using System.Diagnostics;
namespace App
{
    internal static class Program
    {
        internal static void Main()
        {
            Debug.Assert(GetFirstByte(10648738977740919977) != GetFirstByte(10648738977740919977d));
        }
        private static byte GetFirstByte(double val)
        {
            return BitConverter.GetBytes(val)[0];
        }
    }
}

Edit

The above code may be more “complicated” than it needs to be. Here is another program that should hopefully clarify what I am asking.

using System;
using System.Diagnostics;
namespace App
{
    internal static class Program
    {
        internal static void Main()
        {
            Debug.Assert(10648738977740919977 != 10648738977740919977d);
        }
    }
}

Addendum

As The General and mjwills stated in the comments, this is almost certainly due to the extended precision format that some ISAs like x86 offer. As to why the .NET Core compiler relies on the extended format to convert the ulong to a double but doesn’t do the same for the real literal is beyond me. Not sure if this is technically a “bug”, but it would be nice if both did the same thing. One can be compliant with the above specification and still use the extended format since IEEE 754-2019 explicitly allows for more than 64-bits of precision. Anyway, the ulong value can fit entirely in the 64-bit significand of x86’s extended format thus leading to no rounding.

TL;DR (aka Edit 2)

I’ll preface this edit with the fact that I am fundamentally and philosophically against the notion that what I am about to write is necessary or even desirable. I believe technical questions that are specific to a particular programming language like this one still “fit” in Stack Overflow and not any of the other Stack Exchange sites (e.g., Computer Science, Theoretical Computer Science, Mathematics and Math Overflow—for things like homotopy type theory). This means that wanting to know the nitty-gritty details of something—even if one may (incorrectly) perceive such things as leading to a violation of “best practices”—is still a worthwhile question. If there exists a more fundamental problem, then a separate question can be made concerning it.

Background

I am creating a 128-bit unsigned integer type, U128, at my job where we write in VB.NET. I decided to implement the ability to explicitly cast U128 terms to Double (i.e., double in C# parlance) terms. IEEE 754 binary64 and binary32 are rather trivial formats as they are almost identical to how base-10 real numbers are formatted—of course they must be made into finite sequences of bits and have biased exponents. Anyway, I first implemented it in Rust since Rust has a native 128-bit unsigned integer type, u128; and The Rustonomicon explicitly states how casts from u128 terms to f64 terms behave. This allowed me to test my algorithm with Rust’s; and unsurprisingly due to the trivial nature of the algorithm—it is ≈ 12 lines of code—my implementation matched Rust’s for several edge cases and 1 billion randomly generated numbers—no, I did not take the time to formally verify that my algorithm was correct.

I then ported my algorithm to VB.NET—knowing how much more popular C# is here, I rewrote it in C# as well and confirmed it had the same behavior—but I wanted to be confident that nothing got lost in translation. The best I could do was to compare casts of ULong (ulong in C#) terms to Double terms with casts of the equivalent ULongs as U128s to Doubles. Sure enough I was dismayed when I discovered 10648738977740919977UL was being cast differently than the equivalent U128. I (correctly) assumed there was a problem with the rounding—FYI, the C# specification does not say how to round numbers that lie perfectly between two numbers; but as expected, it rounds to even. When I compared the first byte—I am using a little-endian CPU—of the Double that my cast created with that of Rust’s, I found that mine was correct. At this point I assumed there was something “fishy” with VB.NET (and later confirmed in C#) since I typically trust Rust more and as previously stated the algorithm is rather trivial.

Fortunately, I was not aware of the (unfortunate) quirk that C# allows for programs to use extended precision capabilities on CPUs that have them including non-compliant ones like x86-based CPUs that only have 80-bits of precision. Had I known that, I likely would have dropped it.

It was not until I examined the first byte of the Double term, 10648738977740919977R (10648738977740919977d in C#), that I was truly befuddled as I found that it did agree with my algorithm. How could this be? I used the same exact machine compiled with the same compiler for the same platform. Finally, I correctly surmised that there is likely a difference in behavior in how the lexical parser treats real literals with how it treats integral literals that are cast to Doubles. To test this theory, I hacked up the program in the initial post (in VB.NET at the time).

At this point, I assumed that implicit casts were using a different algorithm (perhaps for efficiency reasons since one has to track 3 additional bits to know how to properly round). That is why my question was formulated the way it was. I wanted to know the algorithm so that my algorithm would align with it (even though my initial algorithm is (very likely) technically correct per IEEE 754).

Luckily with the eventual help of mjwills, The General, and NetMage, it was discovered that it likely lied with the non-compliant extended precision capabilities of my CPU; although the fact this happens at compilation time is fundamentally different than previous posts that highlighted runtime discrepancies.

I encourage everyone to take the time to read the amazing answer and comments by tannergooding in the link of the answer I eventually posted (including forking over the $15 to read the formal proof about when extended precision abilities are OK and the requirements of such).

* Compiled with Microsoft Visual C# Compiler version 3.7.0-6.20459 for .NET Core 3.1 on Windows 10 Pro 18363.1139 on an Intel Core i7-6600U CPU.

like image 685
philomathic_life Avatar asked Oct 26 '20 23:10

philomathic_life


People also ask

How does %s work in C?

%c deals with a char (that is, a single character), whereas %s deals with a char * (that is, a pointer to an array of characters, hopefully null-terminated).

What is && operator in C?

The logical AND operator ( && ) returns true if both operands are true and returns false otherwise.

What does %d do in C?

%d is a format specifier, used in C Language. Now a format specifier is indicated by a % (percentage symbol) before the letter describing it. In simple words, a format specifier tells us the type of data to store and print. Now, %d represents the signed decimal integer.

What does -> mean in C?

Operation: The -> operator in C or C++ gives the value held by variable_name to structure or union variable pointer_name. Difference between Dot(.) and Arrow(->) operator: The Dot(.) operator is used to normally access members of a structure or union.


1 Answers

The algorithm is IEEE 754 Round to nearest, ties to even. The “counterexamples” to this shown in the question are in fact proofs of a bug which has been accepted by the Roslyn team. The bug is caused by the “runtime conversion implementation”.

like image 134
philomathic_life Avatar answered Nov 09 '22 19:11

philomathic_life