Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Query for Set Cover

Good Day, I would like to implement a T-SQL query for the Set Cover Problem but have been unable to find any hints on how to do this in SQL.

In my case, my table just has two columns (IDnumber and Mut) and I want to find the minimum number of IDNumber to get one of every Mut. I really want to obtain three IDnumbers for every Mut but I figured I better start off with just one because that might be easier.

DECLARE @myTable TABLE (
    IDnumber int,
    Mut varchar(1))

INSERT @myTable VALUES 
 (1,'C'), (1,'N'), (1,'Z'), (1,'M'), (1,'E'), (2,'E'), (3,'B'), (3,'N'), (3,'D'), (3,'K'), 
(3,'W'), (4,'O'), (4,'G'), (4,'N'), (4,'B'), (4,'U'), (4,'C'), (5,'Q'), (5,'H'), (6,'K'), 
(6,'Y'), (6,'M'), (6,'A'), (6,'O'), (6,'U'), (6,'J'), (7,'H'), (7,'U'), (7,'M'), (7,'L'), 
(8,'B'), (8,'K'), (8,'P'), (9,'Y'), (9,'K'), (10,'Z'), (11,'R'), (12,'X'), (12,'R'), 
(12,'O'), (12,'Z'), (4,'C'), (1,'Z'), (4,'S'), (6,'E'), (5,'G'), (4,'C'), (4,'S'), (4,'H'), 
(6,'D'), (7,'W'), (3,'U'), (6,'N'), (7,'Y'), (6,'N'), (6,'F'), (4,'C'), (4,'I'), (7,'P'), 
(10,'H'), (10,'Z'), (10,'S'), (7,'Z'), (6,'B'), (7,'Z'), (8,'X'), (8,'J'), (8,'P'), (10,'K'), 
(8,'K'), (12,'P'), (8,'W'), (10,'M'), (12,'F'), (9,'T'), (9,'D'), (14,'Y'), (12,'P'), 
(14,'J'), (13,'D'), (15,'H'), (12,'J'), (6,'H'), (2,'Z'), (8,'G'), (10,'Q'), (6,'D'), 
(5,'X'), (9,'T'), (6,'W'), (6,'K'), (10,'W'), (7,'J'), (11,'W'), (12,'V'), (9,'F'), (7,'F'), 
(4,'M'), (5,'K'), (12,'G'), (12,'T'), (15,'T'), (13,'W'), (7,'J'), (9,'T'), (10,'U'), (9,'S'), 
(10,'L'), (10,'J'), (10,'H'), (11,'H'), (12,'S'), (12,'A'), (14,'L'), (13,'K'), (13,'D'), 
(4,'M'), (3,'N'), (4,'F'), (7,'M'), (7,'V'), (5,'R'), (4,'K'), (5,'F'), (7,'G'), (8,'M'), 
(4,'X'), (7,'F'), (9,'S'), (7,'N'), (6,'W'), (6,'W'), (5,'S'), (9,'Z'), (10,'I'), (11,'Y'), 
(11,'D'), (9,'X'), (7,'G'), (9,'S'), (9,'H'), (9,'T'), (8,'J'), (10,'U'), (9,'F'), (9,'S'), 
(7,'D'), (14,'R'), (10,'F'), (7,'E'), (15,'M'), (12,'F'), (5,'C'), (8,'E'), (16,'G'), (11,'V'),
(10,'I'), (12,'I'), (11,'Y'), (12,'I'), (14,'J'), (15,'D'), (19,'J'), (16,'B'), (12,'G'), 
(9,'J'), (18,'J'), (18,'C'), (16,'Q'), (18,'P'), (13,'F'), (19,'T'), (15,'J'), (15,'R'), 
(15,'Q'), (15,'O'), (11,'A'), (24,'B'), (19,'S'), (22,'I'), (15,'X'), (20,'T'), (15,'E'), 
(9,'V'), (8,'H'), (16,'N'), (17,'H')
--  Since the above list was generated by a bunch of random numbers/letters I need to
-- delete the duplicates

;WITH cte AS (
  SELECT IDnumber, mut, 
     row_number() OVER(PARTITION BY IDNumber, mut ORDER BY IDNumber) AS [rn]
  FROM @myTable
)
DELETE cte WHERE [rn] > 1


SELECT *
FROM ( SELECT IDnumber, Mut FROM @myTable) AS S
PIVOT
(COUNT(Mut) FOR mut IN ([A],[B],[C],[D],[E],[F],[G],[H],[I],[J],[K],[L],[M],[N],[O],[P],
[Q],[R],[S],[T],[U],[V],[W],[X],[Y],[Z])) AS pvt

So you can see from the pivot table that the minimum IDnumbers would be 3,5,7, and 12.

How would one go about implementing the algorithm? It seems to me that I could find all the combinations (2^6) which would be sets and then determine which sets have all the Muts. The set with the smallest number of IDnumbers is then the minimal set.

That kind of brute force may work but it would be horribly inefficient. My real world case is not enormous, I have 43 unique Muts (not the nine in the example) and ~2000 IDnumbers, but I am thinking that would take some time to run because 2^2000 is pretty darn big...

Thanks!

like image 699
user918967 Avatar asked Oct 28 '16 03:10

user918967


2 Answers

Here's an approach which calculates a bitmask of Mut values for each IDNumber, then uses a recursive CTE to generate all the possible combinations which complete the set. The code outputs all the possible IdNumber combinations which give the final result, excluding those where one or more IdNumber is redundant in a combination (such as 1,2,3,4,5,6 in the sample data set).

To convert this to work with your real data, the major difference is going to be that you'll almost certainly need to find another mechanism to assign bit values to Mut values. (I can cheat a bit here and calculate the bit values by manipulating the decimal ASCII code for each Mut letter - POWER(2,ASCII(Mut) - 64)).

DECLARE @myTable TABLE (
    IDnumber int,
    Mut varchar(1))

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'),
    (2,'A'),(2,'C'),(2,'D'),
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'),
    (4,'A'),(4,'B'),(4,'E'),(4,'F'),
    (5,'Y'),
    (6,'X'),(6,'Z')

-- calculate the target bitmask
DECLARE @target bigint = (  SELECT SUM(POWER(2,ASCII(Mut) - 64))
                            FROM    (SELECT DISTINCT Mut FROM @myTable) AS x
                         )

;WITH baseCTE
AS
(
    --calculate a starting bitmask for each ID
    SELECT IDnumber, sum(mutbit) AS bitmask 
    FROM    (SELECT DISTINCT IDnumber, CAST(POWER(2,ASCII(Mut) - 64) AS bigint) AS mutbit FROM @myTable) AS x
    GROUP BY IDnumber
)
,recCTE
AS
(
    SELECT IDnumber, bitmask, CAST(IdNumber AS varchar(max)) AS trail, 1 AS lev
    FROM baseCTE
    UNION ALL
    SELECT b.IDnumber, b.bitmask | r.bitmask, CAST(CONCAT(r.trail,',',b.IDnumber) AS varchar(max)), r.lev + 1
    FROM recCTE AS r
    JOIN baseCTE AS b
    ON b.IDnumber > r.IDnumber
    WHERE b.bitmask | r.bitmask <> r.bitmask --ignore combinations which don't add any new Mut values
)
SELECT trail, lev 
FROM recCTE
WHERE bitmask = @target
ORDER BY lev

NB. SQL Server bitwise operators only work where one or other value is of an integer type, so this solution is restricted to 64 distinct Mut values (where the mask is a bigint) - to get it to work beyond that, you'd have to use a custom bitwise comparator (possibly using the CLR). However, since the question mentions that you have 43 Mut values, it might work for you for now.

like image 198
Ed Harper Avatar answered Nov 10 '22 07:11

Ed Harper


I would like a larger dataset to test on, but this matches your results for at least the dataset given...

DECLARE @myTable TABLE (
        IDnumber INT,
    Mut VARCHAR(1))

DECLARE @results TABLE (
    IDnumber INT)

INSERT @myTable VALUES 
    (1,'A'),(1,'B'),(1,'C'),(1,'D'),
    (2,'A'),(2,'C'),(2,'D'),
    (3,'A'),(3,'B'),(3,'F'),(3,'Z'),
    (4,'A'),(4,'B'),(4,'E'),(4,'F'),
    (5,'Y'),
    (6,'X'),(6,'Z')

DECLARE @IDnumber INT

WHILE EXISTS (SELECT 1 FROM @myTable)
BEGIN

    WITH MutRank AS
    (
        -- Find how many IDNumbers are associated with each mut
        SELECT Mut,
            COUNT(DISTINCT IDnumber) AS IDnumberCnt
        FROM @myTable
        GROUP BY Mut
    ), MutRarity AS
    (
        -- Translate the IDNumberCnt into a weighted rarity so dupes at  
        -- a IDNumberCnt level reduce their rarity compared to the next level 
        SELECT Mut,
            RANK() OVER (ORDER BY IDnumberCnt DESC) AS MutRarity
        FROM MutRank
    )
    -- Grab the IDNumber that is associated to the most "rare" muts
    SELECT @IDnumber = n.IDnumber
    FROM (SELECT TOP 1 m.IDnumber,
            SUM(MutRarity) AS IDNumberValue
        FROM @myTable m
        JOIN MutRarity mr
            ON m.Mut = mr.Mut
        GROUP BY m.IDnumber
        ORDER BY IDNumberValue DESC, IDnumber) n

    -- Save the number in our results
    INSERT @results (IDnumber)
    SELECT @IDnumber

    -- Purge all records for the "covered" muts and the selected IDNumber
    DELETE m2
    FROM @myTable m1
    JOIN @myTable m2
        ON m1.Mut = m2.Mut
        AND m1.IDnumber = @IDnumber
END

SELECT *
FROM @results
ORDER BY IDnumber
like image 2
Eric J. Price Avatar answered Nov 10 '22 06:11

Eric J. Price