Good Day, I would like to implement a T-SQL query for the Set Cover Problem but have been unable to find any hints on how to do this in SQL.
In my case, my table just has two columns (IDnumber
and Mut
) and I want to find the minimum number of IDNumber
to get one of every Mut
. I really want to obtain three IDnumbers
for every Mut
but I figured I better start off with just one because that might be easier.
DECLARE @myTable TABLE (
IDnumber int,
Mut varchar(1))
INSERT @myTable VALUES
(1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'),
(3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'),
(6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'),
(8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'),
(12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'),
(6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'),
(10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'),
(8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'),
(14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'),
(5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'),
(4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'),
(10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'),
(4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'),
(4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'),
(11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'),
(7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'),
(10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'),
(9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'),
(15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'),
(9,'V'), (8,'H'), (16,'N'), (17,'H')
-- Since the above list was generated by a bunch of random numbers/letters I need to
-- delete the duplicates
;WITH cte AS (
SELECT IDnumber, mut,
row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn]
FROM @myTable
)
DELETE cte WHERE [rn] > 1
SELECT *
FROM ( SELECT IDnumber, Mut FROM @myTable) AS S
PIVOT
(COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P],
[Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt
So you can see from the pivot table that the minimum IDnumbers
would be 3,5,7, and 12.
How would one go about implementing the algorithm? It seems to me that I could find all the combinations (2^6) which would be sets and then determine which sets have all the Muts. The set with the smallest number of IDnumbers is then the minimal set.
That kind of brute force may work but it would be horribly inefficient. My real world case is not enormous, I have 43 unique Muts
(not the nine in the example) and ~2000 IDnumbers
, but I am thinking that would take some time to run because 2^2000 is pretty darn big...
Thanks!
Here's an approach which calculates a bitmask of Mut
values for each IDNumber
, then uses a recursive CTE to generate all the possible combinations which complete the set. The code outputs all the possible IdNumber
combinations which give the final result, excluding those where one or more IdNumber
is redundant in a combination (such as 1,2,3,4,5,6
in the sample data set).
To convert this to work with your real data, the major difference is going to be that you'll almost certainly need to find another mechanism to assign bit values to Mut
values. (I can cheat a bit here and calculate the bit values by manipulating the decimal ASCII code for each Mut
letter - POWER(2,ASCII(Mut) - 64)
).
DECLARE @myTable TABLE (
IDnumber int,
Mut varchar(1))
INSERT @myTable VALUES
(1,'A'),(1,'B'),(1,'C'),(1,'D'),
(2,'A'),(2,'C'),(2,'D'),
(3,'A'),(3,'B'),(3,'F'),(3,'Z'),
(4,'A'),(4,'B'),(4,'E'),(4,'F'),
(5,'Y'),
(6,'X'),(6,'Z')
-- calculate the target bitmask
DECLARE @target bigint = ( SELECT SUM(POWER(2,ASCII(Mut) - 64))
FROM (SELECT DISTINCT Mut FROM @myTable) AS x
)
;WITH baseCTE
AS
(
--calculate a starting bitmask for each ID
SELECT IDnumber, sum(mutbit) AS bitmask
FROM (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x
GROUP BY IDnumber
)
,recCTE
AS
(
SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev
FROM baseCTE
UNION ALL
SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1
FROM recCTE AS r
JOIN baseCTE AS b
ON b.IDnumber > r.IDnumber
WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values
)
SELECT trail, lev
FROM recCTE
WHERE bitmask = @target
ORDER BY lev
NB. SQL Server bitwise operators only work where one or other value is of an integer type, so this solution is restricted to 64 distinct Mut
values (where the mask is a bigint
) - to get it to work beyond that, you'd have to use a custom bitwise comparator (possibly using the CLR). However, since the question mentions that you have 43 Mut
values, it might work for you for now.
I would like a larger dataset to test on, but this matches your results for at least the dataset given...
DECLARE @myTable TABLE (
IDnumber INT,
Mut VARCHAR(1))
DECLARE @results TABLE (
IDnumber INT)
INSERT @myTable VALUES
(1,'A'),(1,'B'),(1,'C'),(1,'D'),
(2,'A'),(2,'C'),(2,'D'),
(3,'A'),(3,'B'),(3,'F'),(3,'Z'),
(4,'A'),(4,'B'),(4,'E'),(4,'F'),
(5,'Y'),
(6,'X'),(6,'Z')
DECLARE @IDnumber INT
WHILE EXISTS (SELECT 1 FROM @myTable)
BEGIN
WITH MutRank AS
(
-- Find how many IDNumbers are associated with each mut
SELECT Mut,
COUNT(DISTINCT IDnumber) AS IDnumberCnt
FROM @myTable
GROUP BY Mut
), MutRarity AS
(
-- Translate the IDNumberCnt into a weighted rarity so dupes at
-- a IDNumberCnt level reduce their rarity compared to the next level
SELECT Mut,
RANK() OVER (ORDER BY IDnumberCnt DESC) AS MutRarity
FROM MutRank
)
-- Grab the IDNumber that is associated to the most "rare" muts
SELECT @IDnumber = n.IDnumber
FROM (SELECT TOP 1 m.IDnumber,
SUM(MutRarity) AS IDNumberValue
FROM @myTable m
JOIN MutRarity mr
ON m.Mut = mr.Mut
GROUP BY m.IDnumber
ORDER BY IDNumberValue DESC, IDnumber) n
-- Save the number in our results
INSERT @results (IDnumber)
SELECT @IDnumber
-- Purge all records for the "covered" muts and the selected IDNumber
DELETE m2
FROM @myTable m1
JOIN @myTable m2
ON m1.Mut = m2.Mut
AND m1.IDnumber = @IDnumber
END
SELECT *
FROM @results
ORDER BY IDnumber
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With