I have a table with a column of type VARCHAR(MAX) with COLLATE Latin1_General_100_CI_AS_SC_UTF8 in a SQL Server database.
I want to compress the values in the column with built-in COMPRESS() function. But, I am unable to get the original values back via DECOMPRESS() function. What is the problem and how to fix it?
Casting it to NVARCHAR(MAX) after decompression does not work.
Here is a repro of the issue:
DECLARE @test TABLE ([value] VARCHAR(MAX) COLLATE Latin1_General_100_CI_AS_SC_UTF8);
INSERT INTO @test VALUES (N'ададада');
SELECT 
    [value],
    CAST(DECOMPRESS(COMPRESS([value])) AS VARCHAR(MAX)) COLLATE Latin1_General_100_CI_AS_SC_UTF8 AS [decompress_1],
    CAST(DECOMPRESS(COMPRESS([value])) AS NVARCHAR(MAX)) AS [decompress_2]
FROM
    @test;

COLLATEing the value after you have converted it is, unfortunately, too late. CONVERT/CAST will convert to a varchar of the databases collation, so if that value was in a different collation, it'll be implicitly converted back to the databases (for example Windows 1252). If you're working in a UTF-8 collated database, then the problem won't occur, and you can use a UTF-8 varchar the whole time. If you aren't, however, then you're better off CONVERTing/CASTing to an nvarchar, and the COLLATE the decompressed value and finally CONVERT/CAST it to a varchar:
DECLARE @test TABLE ([value] varchar(MAX) COLLATE Latin1_General_100_CI_AS_SC_UTF8);
INSERT INTO @test VALUES (N'ададада');
SELECT [value],
       CONVERT(varchar(MAX),CONVERT(nvarchar(MAX),DECOMPRESS(COMPRESS(CONVERT(nvarchar(MAX),[value])))) COLLATE Latin1_General_100_CI_AS_SC_UTF8) AS [decompress_2]
FROM @test;
db<>fiddle
Truthfully, if you're not in a UTF-8 collated database, I would personally avoid UTF-8 collated columns. Use nvarchar if you need 2 byte character values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With