I have a table containing a list of words and associated wordForms. The typical data in the table looks like this below. Note some of the columns of WordForms end in , e.g. and some just end with the last wordform word
Id Word WordForms
1 abandon abandoned, abandoning, abandonment, abandons
2 abstract abstraction, abstractions, abstractly, abstracts, e.g.
Here is the layout of the source data table:
CREATE TABLE [dbo].[TempWords]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[Word] NVARCHAR (MAX) NOT NULL,
[WordForms] NVARCHAR (MAX) NULL,
)
I would like to use this data to populate two tables. I know about using SQL INSERT INTO but that will I think only help me with the one table. What I would like to do is to take the first Word, put that into the Words table and then separate the wordforms that are now divided up by a comma and put those into the WordForms Table.
CREATE TABLE [dbo].[Words]
(
[WordId] INT IDENTITY (1, 1) NOT NULL,
[Word] NVARCHAR (MAX) NOT NULL
)
CREATE TABLE [dbo].[WordForms]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[WordId] INT NOT NULL,
[Text] NVARCHAR (MAX) NULL,
)
Can anyone give me some tips as to how I can do this?
First of all, you can create UDF function to separate CSV values.
CREATE FUNCTION dbo.fn_Split (
@InputString VARCHAR(8000),
@Delimiter VARCHAR(50)
)
RETURNS @Items TABLE (
Item VARCHAR(8000)
)
AS
BEGIN
IF @Delimiter = ' '
BEGIN
SET @Delimiter = ','
SET @InputString = REPLACE(@InputString, ' ', @Delimiter)
END
IF (@Delimiter IS NULL OR @Delimiter = '')
SET @Delimiter = ','
--INSERT INTO @Items VALUES (@Delimiter) -- Diagnostic
--INSERT INTO @Items VALUES (@InputString) -- Diagnostic
DECLARE @Item VARCHAR(8000)
DECLARE @ItemList VARCHAR(8000)
DECLARE @DelimIndex INT
SET @ItemList = @InputString
SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)
WHILE (@DelimIndex != 0)
BEGIN
SET @Item = SUBSTRING(@ItemList, 0, @DelimIndex)
INSERT INTO @Items VALUES (@Item)
-- Set @ItemList = @ItemList minus one less item
SET @ItemList = SUBSTRING(@ItemList, @DelimIndex+1, LEN(@ItemList)-@DelimIndex)
SET @DelimIndex = CHARINDEX(@Delimiter, @ItemList, 0)
END -- End WHILE
IF @Item IS NOT NULL -- At least one delimiter was encountered in @InputString
BEGIN
SET @Item = @ItemList
INSERT INTO @Items VALUES (@Item)
END
-- No delimiters were encountered in @InputString, so just return @InputString
ELSE INSERT INTO @Items VALUES (@InputString)
RETURN
END -- End Function
GO
Then you can use below INSERT statements to populate tables.
INSERT INTO [Words]
SELECT Word FROM TempWords
INSERT INTO WordForms
SELECT
W.WordId,
LTRIM(RTRIM(FNT.Item)) AS Item
FROM TempWords AS TW
INNER JOIN [Words] AS W
ON TW.[Word]=W.[Word]
CROSS APPLY fn_Split(REPLACE(TW.WordForms,', e.g.',''),',') AS FNT
SELECT * FROM [Words]
SELECT * FROM WordForms
You can insert words into first table, than parse word forms and insert them into child table with link to parent table.
Link to parent table can be obtained by joining on word
column (I guess it's unique) or by doing some MERGE + OUTPUT
thing to obtain in one step SOURCE.ID
(from @words_csv) and INSERTED.ID
. Any way you like.
Parsing could be implemented in many ways too, check this example (actually I would not recommend to parse with sql at all).
DECLARE @words_csv TABLE (Id INT IDENTITY(1, 1), Word VARCHAR(100), WordForms VARCHAR(1000))
INSERT INTO @words_csv(word, wordforms)
VALUES
('abandon', 'abandoned, abandoning, abandonment, abandons, e.g.'),
('abstract', 'abstraction, abstractions, abstractly, abstracts')
--INSERT INTO [dbo].[Words](word)
--SELECT w.word
--FROM @words_csv w
;WITH word_forms_extracted AS
(
SELECT w.id,
w.word,
ltrim(rtrim(cast(case when CHARINDEX(',', w.WordForms) > 0 then substring(w.wordforms, 1, CHARINDEX(',', w.WordForms)-1) end AS VARCHAR(1000)))) wordform,
stuff(w.wordforms, 1, CHARINDEX(',', w.WordForms), '') wordforms
FROM @words_csv w
UNION ALL
SELECT w.id,
w.word,
ltrim(rtrim(cast(case when CHARINDEX(',', wfe.WordForms) > 0 then substring(wfe.wordforms, 1, CHARINDEX(',', wfe.WordForms)-1) else wfe.wordforms end AS VARCHAR(1000)))) wordform,
case when CHARINDEX(',', wfe.WordForms) > 0 then stuff(wfe.wordforms, 1, CHARINDEX(',', wfe.WordForms), '') ELSE '' end wordforms
FROM @words_csv w
INNER JOIN word_forms_extracted wfe
ON wfe.id = w.id
WHERE wfe.wordforms != ''
)
SELECT wf.id, wf.word, wf.wordform
FROM word_forms_extracted wf
--INNER JOIN [dbo].[Words] w
--ON w.word = wf.word
WHERE wf.wordform NOT IN ('', 'e.g.')
ORDER BY wf.id, wf.wordform
OPTION(MAXRECURSION 1000)
Final SELECT
can be easily modified to INSERT INTO dbo.WordForms (...) SELECT ...
Link to dbo.Words
is obtained here as you can see by joining on word
column.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With