Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL: efficiently append incremental number to string, avoiding duplicates

I have a set of records (table [#tmp_origin]) containing duplicate entries in a string field ([Names]). I would like to insert the whole content of [#tmp_origin] into the destination table [#tmp_destination], that does NOT allow duplicates and may already contain items.

If the string in the origin table does not exist in the destination table, then in is simply inserted in the destination table, as is. If an entry in the destination table already exists with the same value of the entry in the original table, a string-ified incremental number must be appended to the string, before it is inserted in the destination table.

The process of moving data in this way has been implemented with a cursor, in this sample script:

-- create initial situation (origin and destination table, both containing items)
-- Begin

CREATE TABLE [#tmp_origin] ([Names] VARCHAR(10))
CREATE TABLE [#tmp_destination] ([Names] VARCHAR(10))
CREATE UNIQUE INDEX [IX_UniqueName] ON [#tmp_destination]([Names] ASC)

INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
INSERT INTO [#tmp_origin]([Names]) VALUES ('a')
INSERT INTO [#tmp_origin]([Names]) VALUES ('b')
INSERT INTO [#tmp_origin]([Names]) VALUES ('c')

INSERT INTO [#tmp_destination]([Names]) VALUES ('a')
INSERT INTO [#tmp_destination]([Names]) VALUES ('a_1')
INSERT INTO [#tmp_destination]([Names]) VALUES ('b')

-- create initial situation - End

DECLARE @Name VARCHAR(10)

DECLARE NamesCursor CURSOR LOCAL FORWARD_ONLY FAST_FORWARD READ_ONLY FOR
    SELECT [Names]
    FROM [#tmp_origin];
OPEN NamesCursor;
FETCH NEXT FROM NamesCursor INTO @Name;
 
WHILE @@FETCH_STATUS = 0
BEGIN
    DECLARE @finalName VARCHAR(10)
    SET @finalName = @Name
    DECLARE @counter INT
    SET @counter = 1

    WHILE(1=1)
    BEGIN
        IF NOT EXISTS(SELECT * FROM [#tmp_destination] WHERE [Names] = @finalName)
            BREAK;

        SET @finalName = @Name + '_' + CAST(@counter AS VARCHAR)
        SET @counter = @counter + 1
    END
    INSERT INTO [#tmp_destination] ([Names]) (
        SELECT @finalName
    )

    FETCH NEXT FROM NamesCursor INTO @Name;
END
 
CLOSE NamesCursor;
DEALLOCATE NamesCursor;

SELECT *
FROM [#tmp_destination]

/*
Expected result:
a
a_1
a_2
a_3
b
b_1
c
*/

DROP TABLE [#tmp_origin]
DROP TABLE [#tmp_destination]

This works correctly, but its performance drastically slows down when the number of items to insert increases.

Any idea to speed it up?

thanks

like image 679
Starnuto di topo Avatar asked Mar 06 '23 17:03

Starnuto di topo


1 Answers

Using a windowing function allows the duplicates to be numbered. You can also get the count from the destination table (will need where condition to strip off the suffix you've added):

select orig.names,
       row_number() over (partition by orig.names order by orig.names) as rowNo,
       dest.count
from ##tmp_origin orig
  cross apply (select count(1) from #tmp_destination where names = orig.names) as dest

An insert can be built from the above (new suffix is rowNo + dest.count -1 if greater than zero).

Suggest you refactor the destination temporary table to include the name and suffix as separate columns – this might mean having a new intermediate stage – because this will make the matching logic much simpler.

like image 85
Richard Avatar answered May 04 '23 10:05

Richard