Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

TSQL equally divide resultset to groups and update them

I have my database with 3 tables like so:

enter image description here

Orders table has data like below:

OrderID    OperatorID    GroupID        OrderDesc    Status    Cash    ...
--------------------------------------------------------------------------
      1             1          1      small order         1     100 
      2             1          1    another order         2       0 
      3             1          2      xxxxxxxxxxx         2    1000 
      5             2          2      yyyyyyyyyyy         2     150 
      9             5          1      xxxxxxxxxxx         1       0 
     10          NULL          2      xxxxxxxxxxx         1      10 
     11          NULL          3      xxxxxxxxxxx         1     120 

Operators table:

OperatorID    Name    GroupID    Active
---------------------------------------
      1       John          1         1
      2       Kate          1         1
      4       Jack          2         1
      5       Will          1         0
      6        Sam          3         1

Group table:

GroupID    Name
---------------
      1      G1
      2      G2
      3      X1

As You can see John has 3 orders, Kate 1, Will 1, Jack and Sam none.

Now I would like to assign operators to orders base on some conditions:

  • order must have cash>0
  • order must have status=1
  • order must be in group 1 or 2
  • operator must be active (active=1)
  • operator must be in group 1 or 2

This is the result that I would like to get:

OrderID    OperatorID    GroupID        OrderDesc    Status    Cash    ...
--------------------------------------------------------------------------
      1             1          1      small order         1     100       < change
      2             1          1    another order         2       0 
      3             2          2      xxxxxxxxxxx         2    1000       < change
      5             4          2      yyyyyyyyyyy         2     150       < change
      9             5          1      xxxxxxxxxxx         1       0 
     10             4          2      xxxxxxxxxxx         1      10       < change
     11          NULL          3      xxxxxxxxxxx         1     120 

I would like to shuffle orders and update operatorID so that every time I call this script I get randomly assigner operatorID, but every operator will have equal number or orders (close to equal, because if I have 7 orders one person will have 3 and rest 2).

I can use NTILE to distribute orders into groups, but I need to assign operatorID to that group.

I think that I need to do something like this:

SELECT NTILE(2) OVER( order by orderID desc) as newID,* 
FROM
    orders(NOLOCK)

This will give me my orders table grouped into equal parts. What I need to know is length of operators table (to add it as parameter to NTILE), after that I could join my results with operators (using row_number())

Is there a better solution?

My question again: How to equally divide result set into groups and update that record set using another table data?

EDIT: This is my code so far: http://sqlfiddle.com/#!3/39849/25

EDIT 2 I've updated my question and added more conditions.

I would like to assign operators to orders based on some conditions:

  • order must have cash>0
  • order must have status=1
  • order must be in group 1 or 2
  • operator must be active (active=1)
  • operator must be in group 1 or 2

I'm building this query as stored procedure.
So the first step will be to generate data with new assignments into temporary table and after final approval in second step to update main table based on that temp table.

I have 2 more questions:

  1. Will it be better to first select all all orders and all operators that meets the conditions to temporary table and then do the shuffling or to do it all in one big query?

  2. I would like to pass array or groups as a parameter to my procedure. Which option would be the best to pass array to stored procedure (SQL Server 2005).

    I know this was asked many times but I would like to know if it is better to create a separate function that will cut comma separated string into table (http://www.sommarskog.se/arrays-in-sql-2005.html) or to put everything inside one big fat procedure? :)


FINAL ANSWER: avilable at http://sqlfiddle.com/#!3/afb48/2

SELECT o.*, op.operatorName AS NewOperator, op.operatorID AS NewOperatorId
FROM (SELECT o.*, (ROW_NUMBER() over (ORDER BY newid()) % numoperators) + 1 AS randseqnum
      FROM Orders o CROSS JOIN
     (SELECT COUNT(*) AS numoperators FROM operators WHERE operators.active=1) op
      WHERE o.cash>0 and o.status in (1,3)
     ) o JOIN
     (SELECT op.*, ROW_NUMBER() over (ORDER BY newid()) AS seqnum
      FROM Operators op WHERE op.active=1
     ) op
     ON o.randseqnum = op.seqnum ORDER BY o.orderID

Answer based on Gordon's Linoff answer. Thanks!

like image 401
Misiu Avatar asked Aug 21 '12 16:08

Misiu


1 Answers

I wasn't sure if you really wanted an update query or a select query. The following query returns a new operator for each order, subject to your conditions:

/*
with orders as (select 1 as orderId, 'order1' as orderDesc, 1 as OperatorId),
     operators as (select 1 as operatorID, 'John' as name)
 */
select o.*, op.name as NewOperator, op.operatorID as NewOperatorId
from (select o.*, (ROW_NUMBER() over (order by newid()) % numoperators) + 1 as randseqnum
      from Orders o cross join
     (select COUNT(*) as numoperators from operators) op
     ) o join
     (select op.*, ROW_NUMBER() over (order by newid()) as seqnum
      from Operators op
     ) op
     on o.randseqnum = op.seqnum order by orderid 

It basically assigned a new id to the rows for the join. The order table gets a value between 1 and the number of operators, randomly assignd. This is then joined to a sequence number on the operators.

If you need to update, then you can do something like:

with toupdate as (<above query>)
update orders
    set operatorid = newoperatorid
    from toupdate
    where toupdate.orderid = orders.orderid

Your two questions:

Will it be better to first select all all orders and all operators that meets the conditions to temporary table and then do the shuffling or to do it all in one big query?

The user of temporary tables is a matter of performance and requirements for the application. If the data is being rapidly updated, then yes, using a temporary table is a big win. If you are running the randomization many, many times on the same data, then it can be a win, particularly if the tables are too big to fit in memory. Otherwise, there is not likely to be a big performance gain on a one time run, assuming you put the conditions within the innermost subqueries. However, if performance is an issue, you can test the two approaches.

I would like to pass array or groups as a parameter to my procedure. Which option would be the best to pass array to stored procedure (SQL Server 2005).

Hmmm, switch to 2008 which has table valued parameters. Here is a highly reference article on the subject by Erland Sommarskog: http://www.sommarskog.se/arrays-in-sql-2005.html.

like image 125
Gordon Linoff Avatar answered Oct 01 '22 04:10

Gordon Linoff