Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puzzle: Spreading numbers evenly in groups

This is more of a puzzle really. Its probably been asked elsewhere before, but I couldn't find anything so I thought I'd share the question.

I'm trying to implement some kind of load balancing in an application and have reduced the problem down to what I believe should be a simple TSQL exercise (the application is predominantly in the SQL Server domain (SQL Server 2008 R2)).

Basically I have a table with two integers; a unique, sequential Id and a non-unique Value. The table could hold any number of records and I'd like to produce a table of data where the first n largest Values are split into separate 'groupings' and then the second set of n largest Values are split into separate 'groupings'.

I've got a first draft working below but I believe it can be improved...

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
VALUES  (100), (456), (121), (402), (253), (872), (765), (6529), (1029), (342), (98), (1), (0), (4), (46), (23), (456), (416), (2323), (4579)


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % @GroupCount ORDER BY RowNum DESC) Rnk
    FROM    cte
)
SELECT  ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
ORDER BY [Grouping], Value ASC

This works and produces the following dataset:

Grouping,   Value,      Id
========    =====       ==
1           46          15
1           342         10
1           765         7
1           6529        8
2           23          16
2           253         5
2           456         2
2           4579        20
3           4           14
3           121         3
3           456         17
3           2323        19
4           1           12
4           100         1
4           416         18
4           1029        9
5           0           13
5           98          11
5           402         4
5           872         6

The dataset returned is correct in that the first n largest values are split into separate groupings and so on but the total values in each grouping are quite different in grouping 1 compared to grouping 5 (for example).

When grouped and SUMmed we can see the un-even spread:

Grouping,   SummedValues
========    ============
1           7682
2           5311
3           2904
4           1546
5           1372

In as few lines as possible how can I better balance the Values so that the total Values in each grouping is more evenly spread?

like image 245
Drammy Avatar asked Mar 16 '17 16:03

Drammy


4 Answers

This is flawed, but not terrible for given the example data. Your mileage may vary.

declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/sum(value+.0) over()
      , target = 1.0 / @groupcount 
  from t
)
, remaining as (
select id, value, rn
  , grp = convert(int,(sum(value) over (order by rn)/sum(value+.0) over())*@groupCount)+1
from cte
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/UNV61100

results:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+


Sql Server 2008 compatable version:
declare @groupcount int = 5;
create table t (id int identity(1, 1), value int);
insert  t values 
    (100),(456),(121),(402),(253),(872),(765),(6529),(1029),(342)
  , (98),(1),(0),(4),(46),(23),(456),(416),(2323),(4579);
;with cte as (
  select *
      , rn = row_number() over (order by value asc)
      , pct = value/tv.TotalValue
      , target = 1.0 / @groupcount 
  from t
    cross join (select TotalValue = sum(value+.0) from t) tv
)
, remaining as (
select id, value, rn
  , grp = convert(int,((x.sumValueOver/TotalValue)*@groupcount)+1)
from cte
  outer apply (
    select sumValueOver = sum(value) 
    from cte i
    where i.rn <= cte.rn
      ) x
)
select
    grp = row_number() over (order by sum(value) desc)
  , sumValue = sum(value)
from remaining
group by grp

rextester demo: http://rextester.com/DEUDJ77007

returns:

+-----+----------+
| grp | sumValue |
+-----+----------+
|   1 |     6529 |
|   2 |     4579 |
|   3 |     3483 |
|   4 |     2323 |
|   5 |     1901 |
+-----+----------+
like image 139
SqlZim Avatar answered Nov 15 '22 22:11

SqlZim


Here NTILE function in sql server may help you.

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579

;With cte
AS
(
    SELECT *, NTILE(@GroupCount) OVER(ORDER BY Value DESC) AS GroupNo FROM @Test
)
SELECT GroupNo, SUM(Value) AS SummedValues FROM cte
GROUP BY GroupNo

and i get this result.

GroupNo SummedValues
--------------------
1       14460
2       2549
3       1413
4       365
5       28
like image 45
Krishnraj Rana Avatar answered Nov 15 '22 21:11

Krishnraj Rana


A slightly better way to do this would be to "snake" the selections. You're lining up the 1st, 6th, 11th highest - of course that's way higher than 5th, 10th, 15th.

Better would be 1st, 10th, 11th, versus 5th, 6th, 15th. Still not perfect, and with your particular data still very poor, but slightly better than yours.

DECLARE @GroupCount INT = 5

-- Set up the test data
DECLARE @test TABLE (Id INT IDENTITY(1, 1), Value INT)
INSERT  @Test (Value)
SELECT  100
UNION ALL
SELECT  456
UNION ALL
SELECT  121
UNION ALL
SELECT  402
UNION ALL
SELECT  253
UNION ALL
SELECT  872
UNION ALL
SELECT  765
UNION ALL
SELECT  6529
UNION ALL
SELECT  1029
UNION ALL
SELECT  342
UNION ALL
SELECT  98
UNION ALL
SELECT  1
UNION ALL
SELECT  0
UNION ALL
SELECT  4
UNION ALL
SELECT  46
UNION ALL
SELECT  23
UNION ALL
SELECT  456
UNION ALL
SELECT  416
UNION ALL
SELECT  2323
UNION ALL
SELECT  4579


--Order by Value descending
;WITH cte AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (ORDER BY Value DESC) RowNum
    FROM    @Test
)
--use modulus to split into grouping
, cte2 AS
(
    SELECT  *
            ,ROW_NUMBER() OVER (PARTITION BY RowNum % (@GroupCount*2 ) ORDER BY RowNum DESC) Rnk
    FROM    cte
)
select [Grouping], SUM(value) from (
SELECT  floor(abs(@GroupCount - (ROW_NUMBER() OVER (PARTITION BY Rnk ORDER BY Value DESC) - 0.5)) + 0.5) AS 'Grouping'
    ,Value
    ,Id
FROM    cte2
--ORDER BY [Grouping], Value ASC
) a group by [Grouping]
  order by [Grouping] ASC

Ultimately though I think random assignment is likely better than this, maybe random assignment while checking that the sum isn't already 2*(1/grouping * total).

Really I think this is not a problem well solved by TSQL or any SQL; languages that can control flow on a row by row basis will better serve you. Python, C#, SAS, whatever other tool that is in your toolbox. (PL/SQL is the one place I'd consider going here...)

Anything that would let you say, on a row-level basis, "Having kept track of what I've assigned so far, assign this particular case to the bucket with the lowest number so far" really would work better.

Grouping Summed Values
---------------------

1       1781
2       1608
3       2904
4       5249
5       7273
like image 1
Joe Avatar answered Nov 15 '22 23:11

Joe


Using the ntile and the row_number window functions together to not only split it into the even groups (even by count, not sum) but make a better decision of what values to include in each group to even out the total in each group as much as possible.

Answer:

select case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end as [grouping]
, b.value
, b.id
from (
    select a.id
    , a.value
    , a.grp_split
    , row_number() over (partition by a.grp_split order by a.value desc) grp_split_rnk_desc
    , row_number() over (partition by a.grp_split order by a.value asc) grp_split_rnk_asc
    from (
        select t.id
        , t.value
        , ntile(@ntile_cnt) over (order by t.value desc) as grp_split
        from @test as t
        ) as a
    ) as b
order by case b.grp_split when 1 then b.grp_split_rnk_desc else grp_split_rnk_asc end asc
, b.value asc

Results:

Not perfect, but slightly closer.

Group   Total
1       7029
2       5096
3       2904
4       1761
5       2025
like image 1
tarheel Avatar answered Nov 15 '22 21:11

tarheel