A while ago I asked a question about hierarchy/version number sorting in SQL Server. ( How Can I Sort A 'Version Number' Column Generically Using a SQL Server Query).
Among the answers that were submitted was this link to a TSQL Coding challenge with much the same puzzle.
In the SQL2000 solution the author demonstrated a two variations, one using and returning a varchar and the other varbinary. The author explains THAT he is doing this without explaining WHY.
So, my question is really, what main differences/advantages (if any) of the difference in approach? I.e. why use a varbinary instead of a varchar?
I've omitted posting the code, as its most elegantly summed up in the above article.
The VARBINARY type is similar to the VARCHAR type, but stores binary byte strings rather than non-binary character strings. M represents the maximum column length in bytes. It contains no character set, and comparison and sorting are based on the numeric value of the bytes.
To guard against the "Invalid mix of collations" errors, we can use varbinary. varbinary uses less space than varchar if multi-byte collation is used for the varchar column. (binary strings don't have character sets and collations. Binary strings are merely a sequence of byte values).
Binary, Varbinary & Varbinary(max) are the binary string data types in SQL Server. These data types are used to store raw binary data up to a length of (32K – 1) bytes. The contents of image files (BMP, TIFF, GIF, or JPEG format files), word files, text files, etc. are examples of binary data.
The key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar can hold upto 4000 characters and it takes double the space as SQL varchar.
I believe the expectation is that the varbinary data will generally consume fewer bytes (5), than the varchar one (10 or 11, I think) per portion of the original string, and so, for very large numbers of components, or comparisons to occur, it should be more efficient.
But I'd recommend that if you were looking to use either solution, that you implement both (they're quite short), and try some profiling against your real data (and query patterns), to see if there are practical differences (I wouldn't expect so).
(Crafty Steal): And as Martin points out, the binary comparisons will be more efficient, since it won't involve all of the code that's there to deal with collations. :-)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With