Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is SQL Server Big Endian?

From what I've read, all Windows versions and .NET are little endian. So why the departure from the Microsoft norm for SQL Server?

What I mean by "SQL Server is big endian" is this:

SELECT CONVERT(VARBINARY, 255);

gives:

0x000000FF

and not

0xFF000000

the way something like .NET's BitConverter.GetBytes() does. I guess SQL Server could be storing the number internally as little endian and then CONVERT is just switching it around for some reason. But either way, why?

Edit:

Just noticed this...

DECLARE @q UNIQUEIDENTIFIER = '01234567-89ab-cdef-0123-456789abcdef';
SELECT @q;
SELECT CONVERT(VARBINARY, @q);

gave me:

01234567-89AB-CDEF-0123-456789ABCDEF

0x67452301AB89EFCD0123456789ABCDEF

What the heck?

like image 970
Atario Avatar asked Feb 15 '14 04:02

Atario


People also ask

Why is big-endian used?

For people who use languages that read left-to-right, big-endian seems like the natural way to think of storing a string of characters or numbers -- in the same order you expect to see it presented to you. In this way, many people consider big-endian as storing something in forward fashion, just as they read.

Why does endianness exist?

Another reason it exists is because it seems that it wasn't standardized back in the 1960s and 1970s; some companies (such as Intel with their x86 architecture) decided to go with little-endian (possibly due to the optimization reasoning above), whereas other companies selected big-endian.

Why do we need endian?

If my computer reads bytes from left to right, and your computer reads from right to left, we're going to have issues when we need to communicate. Endianness means that the bytes in computer memory are read in a certain order. We won't have any issues if we never need to share information.

What systems use big-endian?

In the SAS System, the following platforms are considered big endian: IBM mainframe, HP-UX, AIX, Solaris, and Macintosh. The following platforms are considered little endian: VAX/VMS, AXP/VMS, Digital UNIX, Intel ABI, OS/2, and Windows.


1 Answers

Yes: Windows and .NET are Little Endian.

So why is SQL Server Big Endian? Easy: it's not ;-). The MSDN page for Collation and Unicode Support (within SQL Server) even states:

Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.

So why do you get a Big Endian binary value when converting the Int value of 255? Here is where the confusion is. This question is flawed because it is based on a false premise: that you should see the endianness of hardware and/or software reflected in the converted value. But why would you? Endianness affects the internal representation of a value, how it is stored. But it doesn't change the thing itself. You can convert a DATETIME to an INT and you will see an Integer. But if you save that Integer in an INT field, it will be stored as 4 bytes in reverse order since this is a Little Endian system. But that has nothing to do with what you see when you request that value back from the system and it is displayed to you.

For example, run the following to see that converting the INT value of 301 to a BINARY(2) results in 0x012D, because 0x012D = 301, just in hexadecimal. And so converting 0x012D back to INT returns 301, as expected. If the original Int to Binary conversion gave you 0x2D01, well, that does not equate to 301.

SELECT CONVERT(BINARY(2), 301), CONVERT(INT, 0x012D)
-- 0x012D,  301

HOWEVER, if you create a table with an INT column, and insert a value of "301" into that column, and use DBCC PAGE to view the data page as it exists on disk, you will see the following hex digits in the order shown:

2D 01 00 00

Also, to address some of the evidence supporting the premise of the question:

Yes, doing BitConverter.ToString(BitConverter.GetBytes(255)) in .NET will return:

FF-00-00-00

BUT, that is not a conversion as GetBytes() is not converting the "value" but is instead intending on showing the internal system representation, which changes depending on if the system is Little Endian or Big Endian. If you look at the MSDN page for BitConverter.GetBytes, it might be clearer as to what it is actually doing.

When converting actual values, the results won't (and can't) be different across different systems. An integer value of 256 will always be 0x0100 across all systems (even calculators) because Endianness has nothing to do with how you convert values between base 10, base 2, base 16, etc.

In .NET, if you want to do this conversion, you can use String.Format("{0:X8}", 255) which will return:

000000FF

which is the same as what SELECT CONVERT(BINARY(4), 255); returns since they are both converting the value. This result isn't being shown as Big Endian, but is being shown as it truly is, which just happens to match the byte ordering of Big Endian.

In other words, when starting with a bit sequence of 100000000, that can be represented in decimal form as 256, or in hexadecimal form (known as BINARY / VARBINARY within SQL Server) as 0x0100. Endianness has nothing to do with this as these are merely different ways of representing the same underlying value.

Further evidence of SQL Server being Little Endian can be seen when converting between VARBINARY and NVARCHAR. Since NVARCHAR is a 16-bit (i.e. 2 byte) encoding, we can see the byte ordering since there is no numeric equivalent for characters (unlike the 256 -> 0x0100 example) and so there is really nothing else to show (showing Code Point values is not an option due to Supplementary Characters).

As you can see below, a Latin capital A, which has a Code Point of U+0041 (which is numerically the same as 65) converts to a VARBINARY value of 0x4100, because that is the UTF-16 Little Endian encoded value of that character:

SELECT CONVERT(VARBINARY(10), N'A'), -- 0x4100
       CONVERT(NVARCHAR(5), 0x4100), -- A
       CONVERT(INT, 0x4100),         -- 16640
       UNICODE(N'A'),                -- 65
       CONVERT(VARBINARY(8), 65);    -- 0x00000041

SELECT CONVERT(VARBINARY(10), N'ᄀ'), -- 0x0011
       CONVERT(NVARCHAR(5), 0x0011),  -- ᄀ
       CONVERT(INT, 0x0011),          -- 17
       UNICODE(N'ᄀ'),                -- 4352
       CONVERT(VARBINARY(8), 4352);   -- 0x00001100

Also, the "Pile of Poo" emoji (Code Point U+01F4A9) can be seen using the Surrogate Pair "D83D + DCA9" (which the NCHAR function allows for), or you can inject the UTF-16 Little Endian byte sequence:

SELECT NCHAR(0xD83D) + NCHAR(0xDCA9) AS [SurrogatePair],
       CONVERT(NVARCHAR(5), 0x3DD8A9DC) AS [UTF-16LE];
-- 💩   💩

UNIQUEIDENTIFIER is similar in that "what it is" and "how it is stored" are two different things and they do not need to match. Keep in mind that UUID / GUID is not a basic datatype like int or char, but is more of an entity that has a defined format, just like JPG or MP3 files. There is more discussion about UNIQUEIDENTIFIERs in my answer to a related question on DBA.StackExcange (including why it is represented by a combination of Big Endian and Little Endian).

like image 126
Solomon Rutzky Avatar answered Oct 12 '22 23:10

Solomon Rutzky