I have a table with a few hundred thousand rows and the data format is index (int), and a words nvarchar(1000). The words string is made up of a collection of words separated by a space, e.g word1 word2 word3. I want to read the word table and create a dictionary. In terms of pseudo code this is what I want:
INSERT INTO dictionary (dictionaryword)
SELECt splitBySpace(words) FROM word;
This is simple enough to code in Java or C#, but I have found the system takes a long time to process the data. In other processing the cost benefit to running SQL to handle the query (i.e not processing the data in c# or Java) is huge.
I want to create a stored procedure which reads the words, splits them, and then creates the dictionary. I have seen various split procedures which are a little complex, e.g https://dba.stackexchange.com/questions/21078/t-sql-table-valued-function-to-split-a-column-on-commas but I could not see how to re-code this for the task of reading a whole database, splitting the words, and inserting them.
Has anyone any sample code to split the column data and then insert it which can wholly implemented in SQL for reasons of efficiency?
Here is the solution.
DDL:
create table sentence(t varchar(100))
insert into sentence values
('Once upon a time in America'),
('Eyes wide shut')
DML:
select distinct ca.d as words from sentence s
cross apply(select split.a.value('.', 'varchar(100)') as d
from
(select cast('<x>' + REPLACE(s.t, ' ', '</x><x>') + '</x>' as xml) as d) as a
cross apply d.nodes ('/x') as split(a)) ca
Output:
words
a
America
Eyes
in
Once
shut
time
upon
wide
Fiddle http://sqlfiddle.com/#!6/54dff/4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With