Is it possible to improve SQL Server 2008 R2 (and newer) insert performance by replacing (say) 50 float
columns with a single binary(n)
(n
being 50 x 4)?
I would presume that using a fixed size binary(n)
should improve performance (amount of data is the same, with less work needed to handle all the columns and shorter SQL queries), but many sites recommend against using binary
columns, so I would like to see if there are really issues with using this?
Also, the issue is that the table is rather denormalized, and not all columns are filled with values usually, so varbinary(n)
would allow me to reduce the row size in many cases. Sometimes only a single column is filled, but ~10 on average.
And then the third question is, how about going a step further and replacing (say) 5 rows x 50 float32
columns with a single varbinary(5*50*4)
?
So it would be cool to get some insights into:
float
columns with single binary(200)
;float
with single varbinary(204)
(several bytes for flags/length info) - to save space when columns are unused;float
with single varbinary(1024)
(several bytes for flags/length info).Entire row is always read at once in all cases.
(Update)
To clarify, the data being stored is:
Timestamp_rounded Value_0ms Value_20ms Value_40ms ... Value_980ms
2016-01-10 10:00:00 10.0 11.1 10.5 ... 10.5
I am always reading the entire row, the primary clustered key is the first column (Timestamp), and I will never have to query the table by any of the other columns.
Normalized data would obviously have a Timestamp
/Value
pair, where Timestamp
would then have millisecond precision. But then I would have to store 50 rows of two columns, instead of 1 row (Timestamp
+ BLOB
).
varbinary [ ( n | max ) ] n can be a value from 1 through 8,000. max indicates that the maximum storage size is 2^31-1 bytes. The storage size is the actual length of the data entered + 2 bytes.
The VARBINARY data type holds variable-length binary data. Use this type when the data is expected to vary in size. The maximum size for VARBINARY is 8,000 bytes. As an aside, the word VARBINARY stands for varying binary.
Even though BINARY and VARBINARY are binary byte data types, they have differences when it comes to storage. BINARY stores values in fixed lengths while VARBINARY stores in variable depending on the type of values. Values in BINARY data type are padded with 0x00 which is not the case with VARBINARY.
Answer. For the columns in a table, there is a maximum limit of 1024 columns in a table. SQL Server does have a wide-table feature that allows a table to have up to 30,000 columns instead of 1024.
This is a bad idea. Having 50 columns of 4 bytes vs. having one column of 200 bytes obliterates any hope of optimizing the query for any of those 50 columns. To begin, from a 'classic' SQL Server pov:
As you go more 'modern' and start considering SQL Server newer options:
All these without even considering the pain you inflict on your fellow humans trying to query the data.
the issue is that the table is rather denormalized, and not all columns are filled with values usually, so varbinary(n) would allow me to reduce the row size in many cases. Sometimes only a single column is filled, but ~10 on average.
Then use row-compressed storage:
ALTER TABLE <your table> REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = ROW);
If the data is append-only and seldom updated/deleted and most queries are analytical, then even better use columnstores. Since SQL Server 2016 SP1 columnstores are available across every SQL Server edition.
As an experiment I tried out the two different methods to compare them.
I found that after some tuning the binary version was about 3X faster than the 50 col version.
This scenario is very specific and my test only tested something very specific. Any deviation from my test-setup will have an impact on the result.
How the test was made
For the 50 col version I had 50 nullable float columns which I populated all with float.MaxValue
.
For the binary version I had a single column. The value for the column was constructed from a string of 50x float.MaxValue + "|"
, all concatenated into a single long string. The string was then converted to byte[] to be stored in the table.
Both tables were heaps with no indexes or constraints.
My test code can be found here https://github.com/PeterHenell/binaryBulkInsertComparison
I ran the tests on SQL Server 2014 Developer Edition on a 6 Core workstation with SSD drives.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With