I have a list of pets:
And I need to find a correct owner for each of the pet from Owner table
In order to correctly match each pet to an owner I need to use a special matching table that looks like this:
So, for pet with PetID=2 I need to find an owner that has a matched based on three fields:
Pet.Zip = Owner.Zip
and Pet.OwnerName = Owner.Name
and Pet.Document = Owner.Document
In our example, it will work like this:
select top 1 OwnerID from owners
where Zip = 23456
and Name = 'Alex'
and Document = 'a.csv'
if OwnerID is not found I then need to match based on 2 fields (Not using field with the highest priority)
In our example:
select top 1 OwnerID from owners where
Name = 'Alex'
and Document = 'a.csv'
Since no record is found I we then need to match on less fields. In our example:
select top 1 OwnerID from owners where Document = 'a.csv'
Now, we found an owner with OwnerID = 6.
Now we need to update pet with ownerID = 6 and then we can process next pet.
The only way that I can do this right now involves a loop or a cursor + dynamic SQL.
Is it possible to achieve this without loops+dynamic sql? Maybe STUFF + Pivot somehow?
sql fiddle: http://sqlfiddle.com/#!18/10982/1/0
Sample data:
create table temp_builder
(
PetID int not null,
Field varchar(30) not null,
MatchTo varchar(30) not null,
Priority int not null
)
insert into temp_builder values
(1,'Address', 'Addr',4),
(1,'Zip', 'Zip', 3),
(1,'Country', 'Country', 2),
(1,'OwnerName', 'Name',1),
(2,'Zip', 'Zip',3),
(2,'OwnerName','Name', 2),
(2,'Document', 'Document', 1),
(3,'Country', 'Country', 1)
create table temp_pets
(
PetID int null,
Address varchar(100) null,
Zip int null,
Country varchar(100) null,
Document varchar(100) null,
OwnerName varchar(100) null,
OwnerID int null,
Field1 bit null,
Field2 bit null
)
insert into temp_pets values
(1, '123 5th st', 12345, 'US', 'test.csv', 'John', NULL, NULL, NULL),
(2, '234 6th st', 23456, 'US', 'a.csv', 'Alex', NULL, NULL, NULL),
(3, '345 7th st', 34567, 'US', 'b.csv', 'Mike', NULL, NULL, NULL)
create table temp_owners
(
OwnerID int null,
Addr varchar(100) null,
Zip int null,
Country varchar(100) null,
Document varchar(100) null,
Name varchar(100) null,
OtherField bit null,
OtherField2 bit null,
)
insert into temp_owners values
(1, '456 8th st', 45678, 'US', 'c.csv', 'Mike', NULL, NULL),
(2, '678 9th st', 45678, 'US', 'b.csv', 'John', NULL, NULL),
(3, '890 10th st', 45678, 'US', 'b.csv', 'Alex', NULL, NULL),
(4, '901 11th st', 23456, 'US', 'b.csv', 'Alex', NULL, NULL),
(5, '234 5th st', 12345, 'US', 'b.csv', 'John', NULL, NULL),
(6, '123 5th st', 45678, 'US', 'a.csv', 'John', NULL, NULL)
Edit: I'm overwhelmed by a number of great suggestions and responses. I've tested them and many worked well for me. Unfortunately, I can only award bounty to one solution.
The use of cursor, loops and dynamic SQL can be avoided by treating the fields used for comparison as an entry in a bit set for each pet. A bit set (FieldSetRank column) is calculated for each priority based on a bit entry (FieldRank rank column).
The Pets and Owner tables has to be unpivoted so that the fields and their associated values can be compared. Each of the fields and value that has been matched is assigned to a corresponding FieldRank. A new bit set is then calculated based on the matched values (MatchSetRank). Only records where the matched sets (MatchSetRank) are equal to the desired sets (FieldSetRank) are returned.
The query performs one final ranking to return records with the highest MatchSetRank (records that matched on the highest number of columns while maintaining priority criteria. The following T-SQL demonstrates the concept.
;WITH CTE_Builder
AS
(
SELECT [PetID]
,[Field]
,[Priority]
,[MatchTo]
,POWER(2, [Priority] - 1) AS [FieldRank] -- Define the field ranking as bit set numbered item.
,SUM(POWER(2, [Priority] - 1)) OVER (PARTITION BY [PetID] ORDER BY [Priority] ROWS UNBOUNDED PRECEDING) FieldSetRank -- Sum all the bit set IDs to define what constitutes a completed field set ordered by priority.
FROM temp_builder
),
CTE_PetsUnpivoted
AS
( -- Unpivot pets table and assign Field Rank and Field Set Rank.
SELECT [PetsUnPivot].[PetID]
,[PetsUnPivot].[Field]
,[Builder].[MatchTo]
,[PetsUnPivot].[FieldValue]
,[Builder].[Priority]
,[Builder].[FieldRank]
,[Builder].[FieldSetRank]
FROM
(
SELECT [PetID], [Address], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [OwnerName]
FROM temp_pets
) [Pets]
UNPIVOT
(FieldValue FOR Field IN
([Address], [Zip], [Country], [Document], [OwnerName])
) AS [PetsUnPivot]
INNER JOIN [CTE_Builder] [Builder] ON [PetsUnPivot].PetID = [Builder].PetID AND [PetsUnPivot].Field = [Builder].Field
),
CTE_Owners
AS
(
-- Unpivot Owners table and join with unpivoted Pets table on field name and field value.
-- Next assign Pets field rank then calculated the field set rank (MatchSetRank) based on actual matches made.
SELECT [OwnersUnPivot].[OwnerID]
,[Pets].[PetID]
,[OwnersUnPivot].[Field]
,[Pets].Field AS [PetField]
,[Pets].FieldValue as PetFieldValue
,[OwnersUnPivot].[FieldValue]
,[Pets].[Priority]
,[Pets].[FieldRank]
,[Pets].[FieldSetRank]
,SUM([FieldRank]) OVER (PARTITION BY [Pets].[PetID], [OwnersUnPivot].[OwnerID] ORDER BY [Pets].[Priority] ROWS UNBOUNDED PRECEDING) MatchSetRank
FROM
(
SELECT [OwnerID], [Addr], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [Name]
FROM temp_owners
) [Owners]
UNPIVOT
(FieldValue FOR Field IN
([Addr], [Zip], [Country], [Document], [Name])
) AS [OwnersUnPivot]
INNER JOIN [CTE_PetsUnpivoted] [Pets] ON [OwnersUnPivot].[Field] = [Pets].[MatchTo] AND [OwnersUnPivot].[FieldValue] = [Pets].[FieldValue]
),
CTE_FinalRanking
AS
(
SELECT [PetID]
,[OwnerID]
-- -- Calculate final rank, if multiple matches have the same rank then multiple rows will be returned per pet.
-- Change the “RANK()” function to "ROW_NUMBER()" to only return on result per pet.
,RANK() OVER (PARTITION BY [PetID] ORDER BY [MatchSetRank] DESC) AS [FinalRank]
FROM CTE_Owners
WHERE [FieldSetRank] = [MatchSetRank] -- Only return records where the field sets calculated based on
-- actual matches is equal to desired field set ranks. This will
-- eliminate matches where the number of fields that meets the
-- criteria is the same but does not meet priority requirements.
)
SELECT [PetID]
,[OwnerID]
FROM CTE_FinalRanking
WHERE [FinalRank] = 1
I will say right away to save your time:
Given this, my approach is fair straightforward:
There is an outer loop that iterates from the biggest set of matchers (all matching fields) to the smallest set of matchers (one field). On the first iteration, when we don't know yet how many matchers are stored in the database for the pet, we read and use them all. On the next iterations, we decrease number of used matchers by 1 (removing the one with highest priority).
The inner loop iterates over current set of matchers and builds the WHERE
clause that compares fields between Pets
and Owners
table.
Current query is executed and if some owner matches given criteria, we break from the outer loop.
Here is the code that implements this logic:
DECLARE @PetId INT = 2;
DECLARE @MatchersLimit INT;
DECLARE @OwnerID INT;
WHILE (@MatchersLimit IS NULL OR @MatchersLimit > 0) AND @OwnerID IS NULL
BEGIN
DECLARE @CurrMatchFilter VARCHAR(max) = ''
DECLARE @Field VARCHAR(30)
DECLARE @MatchTo VARCHAR(30)
DECLARE @CurrMatchersNumber INT = 0;
DECLARE @GetMatchers CURSOR;
IF @MatchersLimit IS NULL
SET @GetMatchers = CURSOR FOR SELECT Field, MatchTo FROM temp_builder WHERE PetID = @PetId ORDER BY Priority ASC;
ELSE
SET @GetMatchers = CURSOR FOR SELECT TOP (@MatchersLimit) Field, MatchTo FROM temp_builder WHERE PetID = @PetId ORDER BY Priority ASC;
OPEN @GetMatchers;
FETCH NEXT FROM @GetMatchers INTO @Field, @MatchTo;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @CurrMatchFilter <> '' SET @CurrMatchFilter = @CurrMatchFilter + ' AND ';
SET @CurrMatchFilter = @CurrMatchFilter + ('temp_pets.' + @Field + ' = ' + 'temp_owners.' + @MatchTo);
FETCH NEXT FROM @GetMatchers INTO @field, @matchTo;
SET @CurrMatchersNumber = @CurrMatchersNumber + 1;
END
CLOSE @GetMatchers;
DEALLOCATE @GetMatchers;
IF @CurrMatchersNumber = 0 BREAK;
DECLARE @CurrQuery nvarchar(max) = N'SELECT @id = temp_owners.OwnerID FROM temp_owners INNER JOIN temp_pets ON (' + CAST(@CurrMatchFilter AS NVARCHAR(MAX)) + N') WHERE temp_pets.PetID = ' + CAST(@PetId AS NVARCHAR(MAX));
EXECUTE sp_executesql @CurrQuery, N'@id int OUTPUT', @id=@OwnerID OUTPUT;
IF @MatchersLimit IS NULL
SET @MatchersLimit = @CurrMatchersNumber - 1;
ELSE
SET @MatchersLimit = @MatchersLimit - 1;
END
SELECT @OwnerID AS OwnerID, @MatchersLimit + 1 AS Matched;
Performance considerations
There are basically 2 queries that are executed in this approach:
SELECT Field, MatchTo FROM temp_builder WHERE PetID = @PetId;
You should add an index on PetID
field in temp_builder
table and this query will be executed very fast.
SELECT @id = temp_owners.OwnerID FROM temp_owners INNER JOIN temp_pets ON (temp_pets.Document = temp_owners.Document AND temp_pets.OwnerName = temp_owners.Name AND temp_pets.Zip = temp_owners.Zip AND ...) WHERE temp_pets.PetID = @PetId;
This query looks scary because it joins two big tables - temp_owners
and temp_pets
. However temp_pets
table is filtered by PetID
column that should result in just one record. So if you have an index on temp_pets.PetID
column (and you should as this column seems like a primary key), the query will result into scan of temp_owners
table. Such scan will not take the ages even for table with over 1M rows. If the query is still too slow, you could consider adding indexes for columns of temp_owners
table that are used in the matchers (Addr
, Zip
, etc.). Adding indexes has downsides, like bigger database and slower insert/update operations. So before adding the indexes on temp_owners
columns, check the query speed on table without indexes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With