From what I've read, all Windows versions and .NET are little endian. So why the departure from the Microsoft norm for SQL Server?
What I mean by "SQL Server is big endian" is this:
SELECT CONVERT(VARBINARY, 255);
gives:
0x000000FF
and not
0xFF000000
the way something like .NET's BitConverter.GetBytes()
does. I guess SQL Server could be storing the number internally as little endian and then CONVERT
is just switching it around for some reason. But either way, why?
Edit:
Just noticed this...
DECLARE @q UNIQUEIDENTIFIER = '01234567-89ab-cdef-0123-456789abcdef';
SELECT @q;
SELECT CONVERT(VARBINARY, @q);
gave me:
01234567-89AB-CDEF-0123-456789ABCDEF
0x67452301AB89EFCD0123456789ABCDEF
What the heck?
For people who use languages that read left-to-right, big-endian seems like the natural way to think of storing a string of characters or numbers -- in the same order you expect to see it presented to you. In this way, many people consider big-endian as storing something in forward fashion, just as they read.
Another reason it exists is because it seems that it wasn't standardized back in the 1960s and 1970s; some companies (such as Intel with their x86 architecture) decided to go with little-endian (possibly due to the optimization reasoning above), whereas other companies selected big-endian.
If my computer reads bytes from left to right, and your computer reads from right to left, we're going to have issues when we need to communicate. Endianness means that the bytes in computer memory are read in a certain order. We won't have any issues if we never need to share information.
In the SAS System, the following platforms are considered big endian: IBM mainframe, HP-UX, AIX, Solaris, and Macintosh. The following platforms are considered little endian: VAX/VMS, AXP/VMS, Digital UNIX, Intel ABI, OS/2, and Windows.
Yes: Windows and .NET are Little Endian.
So why is SQL Server Big Endian? Easy: it's not ;-). The MSDN page for Collation and Unicode Support (within SQL Server) even states:
Because the Intel platform is a little endian architecture, Unicode code characters are always stored byte-swapped.
So why do you get a Big Endian binary value when converting the Int value of 255? Here is where the confusion is. This question is flawed because it is based on a false premise: that you should see the endianness of hardware and/or software reflected in the converted value. But why would you? Endianness affects the internal representation of a value, how it is stored. But it doesn't change the thing itself. You can convert a DATETIME
to an INT
and you will see an Integer. But if you save that Integer in an INT field, it will be stored as 4 bytes in reverse order since this is a Little Endian system. But that has nothing to do with what you see when you request that value back from the system and it is displayed to you.
For example, run the following to see that converting the INT
value of 301
to a BINARY(2)
results in 0x012D
, because 0x012D = 301, just in hexadecimal. And so converting 0x012D
back to INT
returns 301
, as expected. If the original Int to Binary conversion gave you 0x2D01, well, that does not equate to 301.
SELECT CONVERT(BINARY(2), 301), CONVERT(INT, 0x012D)
-- 0x012D, 301
HOWEVER, if you create a table with an INT
column, and insert a value of "301" into that column, and use DBCC PAGE
to view the data page as it exists on disk, you will see the following hex digits in the order shown:
2D 01 00 00
Also, to address some of the evidence supporting the premise of the question:
Yes, doing BitConverter.ToString(BitConverter.GetBytes(255))
in .NET will return:
FF-00-00-00
BUT, that is not a conversion as GetBytes()
is not converting the "value" but is instead intending on showing the internal system representation, which changes depending on if the system is Little Endian or Big Endian. If you look at the MSDN page for BitConverter.GetBytes, it might be clearer as to what it is actually doing.
When converting actual values, the results won't (and can't) be different across different systems. An integer value of 256 will always be 0x0100 across all systems (even calculators) because Endianness has nothing to do with how you convert values between base 10, base 2, base 16, etc.
In .NET, if you want to do this conversion, you can use String.Format("{0:X8}", 255)
which will return:
000000FF
which is the same as what SELECT CONVERT(BINARY(4), 255);
returns since they are both converting the value. This result isn't being shown as Big Endian, but is being shown as it truly is, which just happens to match the byte ordering of Big Endian.
In other words, when starting with a bit sequence of 100000000
, that can be represented in decimal form as 256
, or in hexadecimal form (known as BINARY
/ VARBINARY
within SQL Server) as 0x0100
. Endianness has nothing to do with this as these are merely different ways of representing the same underlying value.
Further evidence of SQL Server being Little Endian can be seen when converting between VARBINARY
and NVARCHAR
. Since NVARCHAR
is a 16-bit (i.e. 2 byte) encoding, we can see the byte ordering since there is no numeric equivalent for characters (unlike the 256 -> 0x0100 example) and so there is really nothing else to show (showing Code Point values is not an option due to Supplementary Characters).
As you can see below, a Latin capital A
, which has a Code Point of U+0041 (which is numerically the same as 65) converts to a VARBINARY
value of 0x4100
, because that is the UTF-16 Little Endian encoded value of that character:
SELECT CONVERT(VARBINARY(10), N'A'), -- 0x4100
CONVERT(NVARCHAR(5), 0x4100), -- A
CONVERT(INT, 0x4100), -- 16640
UNICODE(N'A'), -- 65
CONVERT(VARBINARY(8), 65); -- 0x00000041
SELECT CONVERT(VARBINARY(10), N'ᄀ'), -- 0x0011
CONVERT(NVARCHAR(5), 0x0011), -- ᄀ
CONVERT(INT, 0x0011), -- 17
UNICODE(N'ᄀ'), -- 4352
CONVERT(VARBINARY(8), 4352); -- 0x00001100
Also, the "Pile of Poo" emoji (Code Point U+01F4A9) can be seen using the Surrogate Pair "D83D + DCA9" (which the NCHAR
function allows for), or you can inject the UTF-16 Little Endian byte sequence:
SELECT NCHAR(0xD83D) + NCHAR(0xDCA9) AS [SurrogatePair],
CONVERT(NVARCHAR(5), 0x3DD8A9DC) AS [UTF-16LE];
-- 💩 💩
UNIQUEIDENTIFIER
is similar in that "what it is" and "how it is stored" are two different things and they do not need to match. Keep in mind that UUID / GUID is not a basic datatype like int
or char
, but is more of an entity that has a defined format, just like JPG or MP3 files. There is more discussion about UNIQUEIDENTIFIER
s in my answer to a related question on DBA.StackExcange (including why it is represented by a combination of Big Endian and Little Endian).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With