Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why (and how) to split column using master..spt_values?

Subquestioning the answer to question "Split one column into multiple rows" which I re-wrote here as [ 1 ].

What is the (meaning of) Type = 'P' and why to use undocumented master..spt_values for splitting a column? What is the benefit of it?


[ 1 ]

CREATE TABLE dbo.Table1 
(
    Col1        CHAR(1),
    Col2        CHAR(1),
    Col3        CHAR(1),
    Col4        VARCHAR(50)
)
GO

INSERT INTO dbo.Table1 VALUES ('A','B','C','1,2,3')
GO
INSERT INTO dbo.Table1 VALUES ('D','E','F','6,7,8,9')
GO


SELECT
    T.col1, RIGHT(LEFT(T.col4,Number-1),
    CHARINDEX(',',REVERSE(LEFT(','+T.col4,Number-1))))
FROM
    master..spt_values,
    table1 T
WHERE
    Type = 'P' AND Number BETWEEN 1 AND LEN(T.col4)+1 AND
    (SUBSTRING(T.col4,Number,1) = ','
    -- OR SUBSTRING(T.col4,Number,1)  = '') --this does not work correctly anyway

Related question:

  • What is the purpose of system table table master..spt_values and what are the meanings of its values?
like image 202

People also ask

What is master Spt_values in SQL Server?

to split column using master..spt_values. it contains numbers from 0 to 2047. It is very useful.for example if you need to populate a table with 100 numbers from this range. for building indexes. creating virtual calendars.


1 Answers

Purpose

Why use undocumented master..spt-values

Sybase, and therefore its bastard son MS SQL, provide various features and functions for the product, that is implemented in system procedures (as opposed to the binaries like sqlserver, which are started as a service). These system procedures procedures are written in SQL code and named sp_%. Except for some secret internals, they have the same limitations and needs as any other SQL code. They are part of the Sybase ASE or SQL Server product. As such, they are not required to document it; and the internal bits cannot be reasonably labelled as "undocumented".

master..spt_values contains all the various bits and pieces that the said system procedures need, in an SQL table, to produce the various reports. The sp means system procedure; spt means tables for system procedures; and of course values is the content.

Lookup Tables

What is the (meaning of) Type = 'P'

People often describe spt_values as "de-normalised", but that is the incorrect term. The correct term is folded, or packed. It is 26 or so logical Lookup tables, each beautifully Normalised, folded into one physical table, with a Type column to differentiate the logical tables.

Now in a normal database, that would be a gross error (just look at the answers for "one lookup table or many"). But in a server catalogue, it is desirable, it replaces 26 physical tables.

  • "L" stands for for LockType Lookup; "V" stands for DeviceType Lookup (V is short for Device throughout the server); etc. Type "P2" contains bitwise ordinals, for the expansion of bits that are packed into an INT.

  • A set of consecutive numbers within known bounds, that is available in the form of an SQL table is required, in order to perform a Projection, which many of the system procedures have to do. Type "P" is a list of consecutive numbers between 0 and 2047.

  • The term Projection is used here as the technically precise meaning, the natural logical sense, not the relational algebra meaning, which is unnatural.

There is therefore just one purpose for spt_values, to contain 26 folded, otherwise separate, Reference tables, and one Projection table.

Expansion

The ordinary use of spt_values then, is as an ordinary Lookup or Reference or ENUM table. First, the Lookup values:

    SELECT *                    -- list Genders
        FROM Gender 

It is used in the same way that Person has a GenderCode that needs to be expanded (very expanded, these freaky days):

    SELECT  P.*,                -- list Person
            G.Name              -- expand GenderCode to Name
        FROM Person P
        JOIN Gender G
            ON P.GenderCode = G.GenderCode

Eg. sp_lock produces a report of active locks, displaying lock types as string names. But master..syslocks contains lock types as numbers, it does not contain those names; and if it did, it would be a badly denormalised table ! If you execute the query (Sybase ASE code, you will have to convert):

    SELECT *                    -- list LockTypes
        FROM master..spt_values 
        WHERE type = "L"

you will notice 66 LockType numbers and names in the Lookup table. That allows sp_lock to execute simple code like Person::Gender above:

    SELECT  spid,               -- list Active Locks
            DB_NAME(dbid),
            OBJECT_NAME(id, dbid),
            v.name,             -- expand lock name
            page,
            row
    FROM master..syslocks   L,
         master..spt_values LT
    WHERE L.type = LT.number    -- 
    AND   type = "L"            -- LockType Lookup table
    ORDER by 1, 2, 3, 4, 5, 6   -- such that perusal is easy

Projection

What is the (meaning of) Type = 'P' ?

What is Projection and how is it used ?

Say, for example, instead of the active locks produced by the query above, you wanted a list of all 66 LockTypes, showing the number of active locks (or Null). You don't need a cursor, or a WHILE loop. We could Project the LockType Lookup table, through the count of active locks:

    SELECT  LT.name,            -- list LockTypes
            [Count] = (         -- with count
        SELECT COUNT(*)
            FROM master..syslocks
            WHERE type = LT.number
                )
        FROM master..spt_values LT
        WHERE type = "L"

There are several methods, that is just one. Another method is to use a Derived Table instead of the Subquery. But you still need the Projection.

That is typically what spt_values is used for, either Expansion or Projection. Now that you know it is there, you can use it too. It is safe (in the master database) and used by virtually all the system procedures, which means the system procedures cannot run without it.

for splitting a column?

Ah, you do not understand the "Split one CSV column into multiple rows" code.

  • Forget about spt_values for a moment, and examine that code again. It just needs a list of consecutive numbers, so that in can step through the list of values in the CSV column, byte by byte. The code is activated only for each byte that is a comma, or end-of-string.

  • Where to get a set of consecutive numbers in the form of an SQL table, rather than CREATing one from scratch and INSERTing into it? Why, master..spt_values of course. If you know it is there.

  • (You can learn a bit about the internals of ASE or SQL Server, just by reading the code of the system stored procedures.)

  • Note that any CSV field in one column is a gross Normalisation error, it breaks 2NF (contains repeating values) and 1NF (not atomic). Note, that is not packed or folded, it is a repeating group, it is un-normalised. One of the many negative consequences of such a gross error is, instead of using simple SQL to navigate the repeating group as rows, one has to use complex code to determine and extract the content of the un-normalised CSV field. Here spt_values P provides a vector for that complex code, making it easier.

What is the benefit of it?

I think I have answered that. If you did not have it, every system procedure that requires a list of Numbers would have to CREATE a temp table; and INSERT the rows into it; before running its code. Of course, not having to perform those steps, makes the system procedures much faster.

Now, when you need to perform a Projection, eg. calendar dates in the future, or whatever, you can use spt_values, instead of having to create your own temp table each time (or create your own private permanent table and maintain it).

like image 92
PerformanceDBA Avatar answered Nov 12 '22 07:11

PerformanceDBA