Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get number of chars in string in Transact SQL, the "other way"

We faced a very strange issue (really strange for such mature product): how to get number of characters in Unicode string using Transact-SQL statements. The key problem of this issue that the len() TSQL function returns number of chars, excluding trailing blanks. The other variant is to use datalength (which return number of bytes) and divide by 2, so get numbers of Unicode chars. But Unicode chars can be surrogate pairs so it won't work either.

We have 2 variants of solution: the first is to use len(replace()) and the second is add a single symbol and then subtract 1 from result. But IMO both variants are rather ugly.

declare @txt nvarchar(10)
set @txt = 'stack ' 
select @txt as variable, 
 len(@txt) as lenBehaviour, 
 DATALENGTH(@txt)/2 as datalengthBehaviour, 
 len(replace(@txt,' ','O')) as ReplaceBehaviour, 
 len(@txt+'.')-1 as addAndMinusBehaviour

Any other ideas how to count chars in string with trailing spaces?

like image 600
Alexey Shcherbak Avatar asked Jun 22 '09 08:06

Alexey Shcherbak


3 Answers

I can't leave a comment so I will have to leave an answer (or shutup).

My vote would be for the addAndMinusBehaviour

I haven't got a good third alternative, there maybe some obscure whitespace rules to fiddle with in the options / SET / Collation assignment but don't know more detail off the top of my head.

but really addAndMinusBehaviour is probably the eaiest to implement, fastest to execute and if you document it, farily maintainable as well.

like image 112
Robin Vessey Avatar answered Nov 02 '22 13:11

Robin Vessey


CREATE FUNCTION [dbo].[ufn_CountChar] ( @pInput VARCHAR(1000), @pSearchChar CHAR(1) )
RETURNS INT
BEGIN

RETURN (LEN(@pInput) - LEN(REPLACE(@pInput, @pSearchChar, '')))

END
GO
like image 4
Michael Ritchie Avatar answered Nov 02 '22 12:11

Michael Ritchie


My understanding is that DATALENGTH(@txt)/2 should always give you the number of characters. SQL Server stores Unicode characters in UCS-2 which does not support surrogate pairs.

http://msdn.microsoft.com/en-us/library/ms186939.aspx

http://en.wikipedia.org/wiki/UCS2

like image 3
Kim Major Avatar answered Nov 02 '22 12:11

Kim Major