Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Duplicate Orders (by time proximity)

Tags:

sql

sql-server

I have a table of orders that I know have duplicates

    customer   order_number   order_date
   ----------  ------------   -------------------
          1             1     2012-03-01 01:58:00
          1             2     2012-03-01 02:01:00
          1             3     2012-03-01 02:03:00
          2             4     2012-03-01 02:15:00
          3             5     2012-03-01 02:18:00
          3             6     2012-03-01 04:30:00
          4             7     2012-03-01 04:35:00
          5             8     2012-03-01 04:38:00
          6             9     2012-03-01 04:58:00
          6            10     2012-03-01 04:59:00

I want to find all duplicates (order by same customer within 60 minutes of eachother). Either a resultset consisting of the 'duplicate' rows or a set of all customers with a count of how many duplicates.

Here is what I have tried

SELECT
   customer,
   count(*)
FROM
   orders
GROUP BY
   customer,
   DATEPART(HOUR, order_date)
HAVING (count(*) > 1)

This doesn't work when duplicates are within 60 minutes of each other but are in different hours i.e 1:58 and 2:02

I've also tried this

SELECT
  o1.customer,
  o1.order_number,
  o2.order_number,
  DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff]
FROM
  orders o1 LEFT OUTER JOIN
  orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number
WHERE
  ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60

Now this gives me all of the duplicates but it also gives me multiple rows per duplicate order. i.e (o1, o2) and (o2, o1) which wouldn't be so bad if there were'nt some orders with multiple duplicates. In those cases I get (o1, o2), (o1,o3), (o2, o1), (o2, o3), (o3, o1), (o3, o2) etc. I get all of the permutations.

Anyone have some insight? I'm not necessarily looking for the best performing answer here, just one that works.

like image 369
Ben English Avatar asked Mar 02 '12 15:03

Ben English


People also ask

Which clause eliminates duplicates at time of display?

SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.

How can I find duplicate employees in a table?

One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.

Does order by remove duplicates?

Notice in both cases that duplicates are removed even if the rows they come from didn't appear to be adjacent in the database table. By default, when we use ORDER BY , results are sorted in ascending order of the column we specify (i.e., from least to greatest).


1 Answers

SELECT
  *,
  CASE WHEN EXISTS (SELECT *
                      FROM orders AS lookup
                     WHERE customer    = orders.customer
                       AND order_date <  orders.order_date
                       AND order_date >= DATEADD(hour, -1, order_date)
                   )
       THEN 'Principle Order'
       ELSE 'Duplicate Order'
  END as Order_Status
FROM
  orders

Using EXISTS and a correlated sub-query you can check if there were any preceding orders in the last hour.

like image 117
MatBailie Avatar answered Sep 28 '22 17:09

MatBailie