How to split string containing matrix into table in SQL Server? String has columns and row delimiters.
Suppose I have a string:
declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';
Expected results (in three separate columns):
+---+---+---+
| A | B | C |
+---+---+---+
| D | E | F |
+---+---+---+
| X | Y | Z |
+---+---+---+
I am looking for general solution which has not defined number of columns and rows. So the string:
declare @str varchar(max)='A,B;D,E';
will be split into table with two columns:
+---+---+
| A | B |
+---+---+
| D | E |
+---+---+
My efforts. My first idea was to use dynamic SQL which turns the string into:
insert into dbo.temp values (...)
This approach although very fast has a minor drawback because it requires creating a table with the right number of columns first. I have presented this method in the answer to my own question below just to keep the question short.
Another idea would be to write down the string to a CSV file on the server and then bulk insert
from it. Though I do not know how to do it and what would be performance of first and second idea.
The reason why I asked the question is because I want to import data from Excel to SQL Server. As I have experimented with different ADO approaches, this method of sending matrix-string is a landslide victory, especially when the length of the string increases. I asked a younger twin brother of the question here: Turn Excel range into VBA string where you will find suggestions how to prepare such a string from Excel range.
Bounty I decided to award Matt. I weighed highly Sean Lange's answer. Thank you Sean. I liked Matt's answer for its simplicity and shortness. Different approaches apart from Matt's and Sean's could be in parallel use so for the time being I am not accepting any answer (update: Finally, after a few months, I have accepted Matt's answer). I wish to thank Ahmed Saeed for his idea with VALUES, for it is a nice evolution of the answer I began with. Of course, it is no match for the Matt's or Sean's. I upvoted every answer. I will appreciate any feedback from you on using these methods. Thank you for the quest.
The STRING_SPLIT(string, separator) function in SQL Server splits the string in the first argument by the separator in the second argument. To split a sentence into words, specify the sentence as the first argument of the STRING_SPLIT() function and ' ' as the second argument.
You can do it using the following methods: Convert delimited string into XML, use XQuery to split the string, and save it into the table. Create a user-defined table-valued function to split the string and insert it into the table. Split the string using STRING_SPLIT function and insert the output into a table.
OK this puzzle intrigued me so I decided to see if I could do this without any looping. There are a couple of prerequisites for this to work. The first is we will assume you have some sort of tally table. In case you don't have that here is the code for mine. I keep this on every system I use.
create View [dbo].[cteTally] as
WITH
E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
)
select N from cteTally
The second piece of this puzzle is needing a set based string splitter. My preference for this is the uber fast Jeff Moden splitter. One caveat is that it will only work with varchar values up to 8,000. This is plenty for most delimited strings I work with. You can find Jeff Moden's splitter (DelimitedSplit8K) here.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
Last but not least is that the technique I am using here is a dynamic cross tab. This is something else I learned from Jeff Moden. He has a great article on the subject here.
http://www.sqlservercentral.com/articles/Crosstab/65048/
Putting all of this together you can come up with something like this which will be really fast and will scale well.
declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';
declare @StaticPortion nvarchar(2000) =
'declare @str varchar(max)=''' + @str + ''';with OrderedResults as
(
select s.ItemNumber
, s.Item as DelimitedValues
, x.ItemNumber as RowNum
, x.Item
from dbo.DelimitedSplit8K(@str, '';'') s
cross apply dbo.DelimitedSplit8K(s.Item, '','') x
)
select '
declare @DynamicPortion nvarchar(max) = '';
declare @FinalStaticPortion nvarchar(2000) = ' from OrderedResults group by ItemNumber';
select @DynamicPortion = @DynamicPortion +
', MAX(Case when RowNum = ' + CAST(N as varchar(6)) + ' then Item end) as Column' + CAST(N as varchar(6)) + CHAR(10)
from cteTally t
where t.N <= (select MAX(len(Item) - LEN(replace(Item, ',', ''))) + 1
from dbo.DelimitedSplit8K(@str, ';')
)
declare @SqlToExecute nvarchar(max) = @StaticPortion + stuff(@DynamicPortion, 1, 1, '') + @FinalStaticPortion
exec sp_executesql @SqlToExecute
--EDIT--
Here is the DelimitedSplit8K function in case the link becomes invalid.
ALTER FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
(@pString VARCHAR(8000), @pDelimiter CHAR(1))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...
-- enough to cover VARCHAR(8000)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT 0 UNION ALL
SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1
FROM cteTally t
WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))
FROM cteStart s
;
One of the easier ways is to convert the string to XML based on replacing your delimiters.
declare @str varchar(max)='A,B,C;D,E,F;X,Y,Z';
DECLARE @xmlstr XML
SET @xmlstr = CAST(('<rows><row><col>' + REPLACE(REPLACE(@str,';','</col></row><row><col>'),',','</col><col>') + '</col></row></rows>') AS XML)
SELECT
t.n.value('col[1]','CHAR(1)') as Col1
,t.n.value('col[2]','CHAR(1)') as Col2
,t.n.value('col[3]','CHAR(1)') as Col3
FROM
@xmlstr.nodes ('/rows/row') AS t(n)
<rows><row><col></col><col></col></row><row><col></col><col></col></row></rows>
Basically you need to add the beginning and ending tags then replace the column delimiter with the column tags and the row delimiter with both column and row tagsas t(n)
tells you how you will end up accessing the XML row and column. t being the table alias and n being the node alias (kind of like a row). so t.n.value() gets a particular rowCOL[1]
means get the first COL
tag in the row it is 1 based so 2 is the next then 3 etc.CHAR(1)
is a datatype definition meaning 1 character and was based on your example data having only 1 character per column. you may noticed I made it VARCHAR(MAX)
in the dynamic query because if data type is unknown then you will want more flexibility.Or dynamically
DECLARE @str varchar(max)='A,B,C,D,E;F,G,H,I,J;K,L,M,N,O';
DECLARE @NumOfColumns INT
SET @NumOfColumns = (LEN(@str) - LEN(REPLACE(@str,',',''))) / (LEN(@str) - LEN(REPLACE(@str,';','')) + 1) + 1
DECLARE @xmlstr XML
SET @xmlstr = CAST(('<rows><row><col>' + REPLACE(REPLACE(@str,';','</col></row><row><col>'),',','</col><col>') + '</col></row></rows>') AS XML)
DECLARE @ParameterDef NVARCHAR(MAX) = N'@XMLInputString xml'
DECLARE @SQL NVARCHAR(MAX) = 'SELECT '
DECLARE @i INT = 1
WHILE @i <= @NumOfColumns
BEGIN
SET @SQL = @SQL + IIF(@i > 1,',','') + 't.n.value(''col[' + CAST(@i AS VARCHAR(10)) + ']'',''NVARCHAR(MAX)'') as Col' + CAST(@i AS VARCHAR(10))
SET @i = @i + 1
END
SET @SQL = @SQL + ' FROM
@XMLInputString.nodes (''/rows/row'') AS t(n)'
EXECUTE sp_executesql @SQL,@ParameterDef,@XMLInputString = @xmlstr
Below code should work in SQL Server. It uses Common Table Expression and Dynamic SQL with little manipulations. Just assign the string value to @str
variable and execute the complete code in one go. Since it uses CTE, it is easy to analyze data at each step.
Declare @Str varchar(max)= 'A,B,C;D,E,F;X,Y,Z';
IF OBJECT_ID('tempdb..#RawData') IS NOT NULL
DROP TABLE #RawData;
;WITH T_String AS
(
SELECT RIGHT(@Str,LEN(@Str)-CHARINDEX(';',@Str,1)) AS RawString, LEFT(@Str,CHARINDEX(';',@Str,1)-1) AS RowString, 1 AS CounterValue, len(@Str) - len(replace(@Str, ';', '')) AS RowSize
--
UNION ALL
--
SELECT IIF(CHARINDEX(';',RawString,1)=0,NULL,RIGHT(RawString,LEN(RawString)-CHARINDEX(';',RawString,1))) AS RawString, IIF(CHARINDEX(';',RawString,1)=0,RawString,LEFT(RawString,CHARINDEX(';',RawString,1)-1)) AS RowString, CounterValue+1 AS CounterValue, RowSize AS RowSize
FROM T_String AS r
WHERE CounterValue <= RowSize
)
,T_Columns AS
(
SELECT RowString AS RowValue, RIGHT(a.RowString,LEN(a.RowString)-CHARINDEX(',',a.RowString,1)) AS RawString,
LEFT(a.RowString,CHARINDEX(',',a.RowString,1)-1) AS RowString, 1 AS CounterValue, len(a.RowString) - len(replace(a.RowString, ',', '')) AS RowSize
FROM T_String AS a
--WHERE a.CounterValue = 1
--
UNION ALL
--
SELECT RowValue, IIF(CHARINDEX(',',RawString,1)=0,NULL,RIGHT(RawString,LEN(RawString)-CHARINDEX(',',RawString,1))) AS RawString, IIF(CHARINDEX(',',RawString,1)=0,RawString,LEFT(RawString,CHARINDEX(',',RawString,1)-1)) AS RowString, CounterValue+1 AS CounterValue, RowSize AS RowSize
FROM T_Columns AS r
WHERE CounterValue <= RowSize
)
,T_Data_Prior2Pivot AS
(
SELECT c.RowValue, c.RowString, c.CounterValue
FROM T_Columns AS c
INNER JOIN
T_String AS r
ON r.RowString = c.RowValue
)
SELECT *
INTO #RawData
FROM T_Data_Prior2Pivot;
DECLARE @columnNames VARCHAR(MAX)
,@sqlQuery VARCHAR(MAX)
SELECT @columnNames = COALESCE(@columnNames+', ['+CAST(CounterValue AS VARCHAR)+']','['+CAST(CounterValue AS VARCHAR)+']') FROM (SELECT DISTINCT CounterValue FROM #RawData) T
PRINT @columnNames
SET @sqlQuery = '
SELECT '+@columnNames+'
FROM ( SELECT * FROM #RawData
) AS b
PIVOT (MAX(RowString) FOR CounterValue IN ('+@columnNames+')) AS p
'
EXEC (@sqlQuery);
Below is Stats screenshot for above query from http://statisticsparser.com/.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With