Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Length of integer in SQL (i.e. length of decimal string)

Quick version: Which is the best of the following and why? (or is there a better way):

SELECT FLOOR(LOG10(Number))+1 AS NumLength FROM Table
SELECT LEN(CONVERT(VARCHAR, Number)) AS NumLength FROM Table
SELECT LEN(CAST(Number AS VARCHAR(10))) AS NumLength FROM Table

A bit more detail:
I wish to identify the most efficient mechanism for calculating the length of the string representation of an integer (more specifically a natural number - always >0).

I'm using MS SQL Server (2005).

I've come up with the 3 solutions above, all of which seem to work fine.

I know the third version may have issues with very large integers, but for now we may assume that "Number" is never more than 9 decimal digits long.

Yet more detail: (you don't have to read this bit to answer my question)
This query is used heavily in a transaction processing environment.
Up to now, I have got away with the assumption that "Number" is always exactly 6 digits long.
However, now I must update the code to support anywhere from 4 to 9 digits.

This SQL is part of a condition to identify the card scheme of a card.

The full query attempts to find the records matching the start of the card number against the Start and End range.

So full SQL condition would be something like this:

WHERE 
-- Start and End match
((Start=End OR End=0) AND (Start=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT))) OR 

-- Start != End
-- >= Start
(Start<=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT) AND 
-- <= End
End>=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT))

NOTE:
I can redesign the table to use VARCHARs instead of INTs. This would allow me to use "LEN(Start)" instead of "FLOOR(LOG10(Start))+1)" however the condition will then have much more CASTs.
I'd prefer to continue to deal in INTs as the DB schema will stay the same, and in any case dealing with INTs should be faster than VARCHARs.

IF I change the fields to VARCHARs, my condition might be:

WHERE 
-- Start and End match
((Start=End OR LEN(End)=0) AND (Start=LEFT('<card number>', LEN(Start)))) OR 

-- Start != End
-- >= Start
(CAST(Start AS BIGINT)<=CAST(LEFT('<card number>', LEN(Start)) AS BIGINT) AND 
-- <= End
CAST(End AS BIGINT)>=CAST(LEFT('<card number>', LEN(Start)) AS BIGINT))

Many thanks for any help,
Dave

like image 268
user390935 Avatar asked Mar 21 '11 13:03

user390935


2 Answers

On my machine versions 2 and 3 come out about equal and beat the other two.

Edit: Though it has just occurred to me that my original test was a bit unfair on CASE as ordering the statements in ascending numerical order means that only 10 possible numbers would meet the first condition and exit early. I've added an additional test below. You might also try nesting CASE statements to do a binary search.

SET NOCOUNT ON
SET STATISTICS TIME ON

  PRINT 'Test 1';

   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(FLOOR(LOG10(N))+1)
   FROM cteTally
  WHERE N <= 10000000;

  PRINT 'Test 2';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(LEN(CONVERT(VARCHAR, N)))
   FROM cteTally
  WHERE N <= 10000000;


  PRINT 'Test 3';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(LEN(CAST(N AS VARCHAR(10))))
   FROM cteTally
  WHERE N <= 10000000;

  PRINT 'Test 4';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT MAX(CASE
             WHEN N < 10 THEN 1
             WHEN N < 100 THEN 2
             WHEN N < 1000 THEN 3
             WHEN N < 10000 THEN 4
             WHEN N < 100000 THEN 5
             WHEN N < 1000000 THEN 6
             WHEN N < 10000000 THEN 7
             WHEN N < 100000000 THEN 8
           END)
FROM   cteTally
WHERE  N <= 10000000;   

  PRINT 'Test 5';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(CASE 
             WHEN N >= 100000000 THEN NULL
             WHEN N >= 10000000 THEN 8
             WHEN N >= 1000000  THEN 7
             WHEN N >= 100000   THEN 6
             WHEN N >= 10000    THEN 5
             WHEN N >= 1000     THEN 4
             WHEN N >= 100      THEN 3
             WHEN N >= 10       THEN 2   
             ELSE                    1
            END   )
   FROM cteTally
  WHERE N <= 10000000;

Results from an example run on my machine are

Test 1
   CPU time = 9422 ms,  elapsed time = 9523 ms.

Test 2
   CPU time = 7021 ms,  elapsed time = 7130 ms.

Test 3
   CPU time = 6864 ms,  elapsed time = 7006 ms.

Test 4
   CPU time = 9328 ms,  elapsed time = 9456 ms.

Test 5
   CPU time = 6989 ms,  elapsed time = 7358 ms.    
like image 167
Martin Smith Avatar answered Oct 27 '22 00:10

Martin Smith


To answer your question, the second version is clearer about what you actually want. Think about what someone looking at this code in six months will think: will they realize that the first version is trying obtain the length of a number represented in decimal, or will they think that you're performing some obscure mathematical operation that they can't find documentation requiring?

More generally, though, you should probably consider storing these values as character data anyway, since they aren't representing real "numbers" to you (you aren't comparing based upon relative value, you aren't performing arithmetic, etc.). You can use CHECK constraints to ensure that only numeric digits are in the field.

I'm not clear on why storing them as character data would require conversions in your queries, assuming that you're consistent. There's also no reason to assume that dealing with ints would be faster than varchar, especially if there's conversion involved in both cases.

like image 21
Adam Robinson Avatar answered Oct 26 '22 23:10

Adam Robinson