Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split string into table given row delimiter and column delimiter in SQL server

How to split string containing matrix into table in SQL Server? String has columns and row delimiters.

Suppose I have a string:

declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';

Expected results (in three separate columns):

+---+---+---+
| A | B | C |
+---+---+---+
| D | E | F |
+---+---+---+
| X | Y | Z |
+---+---+---+

I am looking for general solution which has not defined number of columns and rows. So the string:

declare @str varchar(max)='A,B;D,E';

will be split into table with two columns:

+---+---+
| A | B |
+---+---+
| D | E |
+---+---+

My efforts. My first idea was to use dynamic SQL which turns the string into: insert into dbo.temp values (...) This approach although very fast has a minor drawback because it requires creating a table with the right number of columns first. I have presented this method in the answer to my own question below just to keep the question short.

Another idea would be to write down the string to a CSV file on the server and then bulk insert from it. Though I do not know how to do it and what would be performance of first and second idea.

The reason why I asked the question is because I want to import data from Excel to SQL Server. As I have experimented with different ADO approaches, this method of sending matrix-string is a landslide victory, especially when the length of the string increases. I asked a younger twin brother of the question here: Turn Excel range into VBA string where you will find suggestions how to prepare such a string from Excel range.

Bounty I decided to award Matt. I weighed highly Sean Lange's answer. Thank you Sean. I liked Matt's answer for its simplicity and shortness. Different approaches apart from Matt's and Sean's could be in parallel use so for the time being I am not accepting any answer (update: Finally, after a few months, I have accepted Matt's answer). I wish to thank Ahmed Saeed for his idea with VALUES, for it is a nice evolution of the answer I began with. Of course, it is no match for the Matt's or Sean's. I upvoted every answer. I will appreciate any feedback from you on using these methods. Thank you for the quest.

like image 747
Przemyslaw Remin Avatar asked Sep 28 '16 15:09

Przemyslaw Remin


People also ask

How can I split a string in a table in SQL?

The STRING_SPLIT(string, separator) function in SQL Server splits the string in the first argument by the separator in the second argument. To split a sentence into words, specify the sentence as the first argument of the STRING_SPLIT() function and ' ' as the second argument.

How split a column with delimited string into multiple columns in SQL Server?

You can do it using the following methods: Convert delimited string into XML, use XQuery to split the string, and save it into the table. Create a user-defined table-valued function to split the string and insert it into the table. Split the string using STRING_SPLIT function and insert the output into a table.


3 Answers

OK this puzzle intrigued me so I decided to see if I could do this without any looping. There are a couple of prerequisites for this to work. The first is we will assume you have some sort of tally table. In case you don't have that here is the code for mine. I keep this on every system I use.

create View [dbo].[cteTally] as

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
select N from cteTally

The second piece of this puzzle is needing a set based string splitter. My preference for this is the uber fast Jeff Moden splitter. One caveat is that it will only work with varchar values up to 8,000. This is plenty for most delimited strings I work with. You can find Jeff Moden's splitter (DelimitedSplit8K) here.

http://www.sqlservercentral.com/articles/Tally+Table/72993/

Last but not least is that the technique I am using here is a dynamic cross tab. This is something else I learned from Jeff Moden. He has a great article on the subject here.

http://www.sqlservercentral.com/articles/Crosstab/65048/

Putting all of this together you can come up with something like this which will be really fast and will scale well.

declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';

declare @StaticPortion nvarchar(2000) = 
'declare @str varchar(max)=''' + @str + ''';with OrderedResults as
    (
        select s.ItemNumber
            , s.Item as DelimitedValues
            , x.ItemNumber as RowNum
            , x.Item
        from dbo.DelimitedSplit8K(@str, '';'') s
        cross apply dbo.DelimitedSplit8K(s.Item, '','') x
    )
    select '

declare @DynamicPortion nvarchar(max) = '';
declare @FinalStaticPortion nvarchar(2000) = ' from OrderedResults group by ItemNumber';

select @DynamicPortion = @DynamicPortion + 
    ', MAX(Case when RowNum = ' + CAST(N as varchar(6)) + ' then Item end) as Column' + CAST(N as varchar(6)) + CHAR(10)
from cteTally t
where t.N <= (select MAX(len(Item) - LEN(replace(Item, ',', ''))) + 1
                from dbo.DelimitedSplit8K(@str, ';')
            )

declare @SqlToExecute nvarchar(max) = @StaticPortion + stuff(@DynamicPortion, 1, 1, '') + @FinalStaticPortion
exec sp_executesql @SqlToExecute

--EDIT--

Here is the DelimitedSplit8K function in case the link becomes invalid.

ALTER FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT 0 UNION ALL
                 SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT t.N+1
                   FROM cteTally t
                  WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0) 
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
        Item       = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
   FROM cteStart s
;
like image 174
Sean Lange Avatar answered Oct 15 '22 14:10

Sean Lange


One of the easier ways is to convert the string to XML based on replacing your delimiters.

declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';
DECLARE @xmlstr XML
SET @xmlstr = CAST(('<rows><row><col>' + REPLACE(REPLACE(@str,';','</col></row><row><col>'),',','</col><col>') + '</col></row></rows>') AS XML)

SELECT
    t.n.value('col[1]','CHAR(1)') as Col1
    ,t.n.value('col[2]','CHAR(1)') as Col2
    ,t.n.value('col[3]','CHAR(1)') as Col3
FROM
    @xmlstr.nodes ('/rows/row') AS t(n)
  • Format string as XML <rows><row><col></col><col></col></row><row><col></col><col></col></row></rows> Basically you need to add the beginning and ending tags then replace the column delimiter with the column tags and the row delimiter with both column and row tags
  • .nodes is a method on the xml data type that "is useful when you want to shred an xml data type instance into relational data" https://msdn.microsoft.com/en-us/library/ms188282.aspx
  • as t(n) tells you how you will end up accessing the XML row and column. t being the table alias and n being the node alias (kind of like a row). so t.n.value() gets a particular row
  • COL[1] means get the first COL tag in the row it is 1 based so 2 is the next then 3 etc.
  • CHAR(1) is a datatype definition meaning 1 character and was based on your example data having only 1 character per column. you may noticed I made it VARCHAR(MAX) in the dynamic query because if data type is unknown then you will want more flexibility.

Or dynamically

DECLARE @str varchar(max)='A,B,C,D,E;F,G,H,I,J;K,L,M,N,O';
DECLARE @NumOfColumns INT
SET @NumOfColumns = (LEN(@str) - LEN(REPLACE(@str,',',''))) / (LEN(@str) - LEN(REPLACE(@str,';','')) + 1) + 1

DECLARE @xmlstr XML
SET @xmlstr = CAST(('<rows><row><col>' + REPLACE(REPLACE(@str,';','</col></row><row><col>'),',','</col><col>') + '</col></row></rows>') AS XML)

DECLARE @ParameterDef NVARCHAR(MAX) = N'@XMLInputString xml'
DECLARE @SQL NVARCHAR(MAX) = 'SELECT '

DECLARE @i INT = 1

WHILE @i <= @NumOfColumns
BEGIN
    SET @SQL = @SQL + IIF(@i > 1,',','') + 't.n.value(''col[' + CAST(@i AS VARCHAR(10)) + ']'',''NVARCHAR(MAX)'') as Col' + CAST(@i AS VARCHAR(10))

    SET @i = @i + 1
END

SET @SQL = @SQL + ' FROM
    @XMLInputString.nodes (''/rows/row'') AS t(n)'

EXECUTE sp_executesql @SQL,@ParameterDef,@XMLInputString = @xmlstr
like image 28
Matt Avatar answered Oct 15 '22 13:10

Matt


Below code should work in SQL Server. It uses Common Table Expression and Dynamic SQL with little manipulations. Just assign the string value to @str variable and execute the complete code in one go. Since it uses CTE, it is easy to analyze data at each step.

Declare @Str varchar(max)= 'A,B,C;D,E,F;X,Y,Z';

IF OBJECT_ID('tempdb..#RawData') IS NOT NULL
    DROP TABLE #RawData;
;WITH T_String AS
(
    SELECT  RIGHT(@Str,LEN(@Str)-CHARINDEX(';',@Str,1)) AS RawString, LEFT(@Str,CHARINDEX(';',@Str,1)-1) AS RowString, 1 AS CounterValue,  len(@Str) - len(replace(@Str, ';', '')) AS RowSize
    --
    UNION ALL
    --
    SELECT  IIF(CHARINDEX(';',RawString,1)=0,NULL,RIGHT(RawString,LEN(RawString)-CHARINDEX(';',RawString,1))) AS RawString, IIF(CHARINDEX(';',RawString,1)=0,RawString,LEFT(RawString,CHARINDEX(';',RawString,1)-1)) AS RowString, CounterValue+1 AS CounterValue, RowSize AS RowSize
    FROM    T_String AS r
    WHERE   CounterValue <= RowSize
)
,T_Columns AS
(
    SELECT  RowString AS RowValue, RIGHT(a.RowString,LEN(a.RowString)-CHARINDEX(',',a.RowString,1)) AS RawString, 
            LEFT(a.RowString,CHARINDEX(',',a.RowString,1)-1) AS RowString, 1 AS CounterValue,  len(a.RowString) - len(replace(a.RowString, ',', '')) AS RowSize
    FROM    T_String AS a
    --WHERE a.CounterValue = 1
    --
    UNION ALL
    --
    SELECT  RowValue, IIF(CHARINDEX(',',RawString,1)=0,NULL,RIGHT(RawString,LEN(RawString)-CHARINDEX(',',RawString,1))) AS RawString, IIF(CHARINDEX(',',RawString,1)=0,RawString,LEFT(RawString,CHARINDEX(',',RawString,1)-1)) AS RowString, CounterValue+1 AS CounterValue, RowSize AS RowSize
    FROM    T_Columns AS r
    WHERE   CounterValue <= RowSize
)
,T_Data_Prior2Pivot AS 
(
    SELECT  c.RowValue, c.RowString, c.CounterValue
    FROM    T_Columns AS c
    INNER JOIN
            T_String AS r
        ON  r.RowString = c.RowValue
)
SELECT  *
INTO    #RawData
FROM    T_Data_Prior2Pivot;

DECLARE @columnNames VARCHAR(MAX)
        ,@sqlQuery VARCHAR(MAX)
SELECT @columnNames = COALESCE(@columnNames+', ['+CAST(CounterValue AS VARCHAR)+']','['+CAST(CounterValue AS VARCHAR)+']') FROM (SELECT DISTINCT CounterValue FROM #RawData) T
PRINT @columnNames

SET @sqlQuery = '
SELECT  '+@columnNames+'
FROM    ( SELECT * FROM #RawData 
        ) AS b
PIVOT   (MAX(RowString) FOR CounterValue IN ('+@columnNames+')) AS p
'

EXEC (@sqlQuery);

enter image description here

Below is Stats screenshot for above query from http://statisticsparser.com/.

enter image description here

like image 3
Ajay Dwivedi Avatar answered Oct 15 '22 14:10

Ajay Dwivedi