I have a table of orders that I know have duplicates
customer order_number order_date
---------- ------------ -------------------
1 1 2012-03-01 01:58:00
1 2 2012-03-01 02:01:00
1 3 2012-03-01 02:03:00
2 4 2012-03-01 02:15:00
3 5 2012-03-01 02:18:00
3 6 2012-03-01 04:30:00
4 7 2012-03-01 04:35:00
5 8 2012-03-01 04:38:00
6 9 2012-03-01 04:58:00
6 10 2012-03-01 04:59:00
I want to find all duplicates (order by same customer within 60 minutes of eachother). Either a resultset consisting of the 'duplicate' rows or a set of all customers with a count of how many duplicates.
Here is what I have tried
SELECT
customer,
count(*)
FROM
orders
GROUP BY
customer,
DATEPART(HOUR, order_date)
HAVING (count(*) > 1)
This doesn't work when duplicates are within 60 minutes of each other but are in different hours i.e 1:58 and 2:02
I've also tried this
SELECT
o1.customer,
o1.order_number,
o2.order_number,
DATEDIFF(MINUTE,o1.order_date, o2.order_date) AS [diff]
FROM
orders o1 LEFT OUTER JOIN
orders o2 ON o1.customer = o2.customer AND o1.order_number <> o2.order_number
WHERE
ABS(DATEDIFF(MINUTE,o1.order_date, o2.order_date)) < 60
Now this gives me all of the duplicates but it also gives me multiple rows per duplicate order. i.e (o1, o2) and (o2, o1) which wouldn't be so bad if there were'nt some orders with multiple duplicates. In those cases I get (o1, o2), (o1,o3), (o2, o1), (o2, o3), (o3, o1), (o3, o2) etc. I get all of the permutations.
Anyone have some insight? I'm not necessarily looking for the best performing answer here, just one that works.
SQL Delete Duplicate Rows using Group By and Having Clause According to Delete Duplicate Rows in SQL, for finding duplicate rows, you need to use the SQL GROUP BY clause. The COUNT function can be used to verify the occurrence of a row using the Group by clause, which groups data according to the given columns.
One way to find duplicate records from the table is the GROUP BY statement. The GROUP BY statement in SQL is used to arrange identical data into groups with the help of some functions. i.e if a particular column has the same values in different rows then it will arrange these rows in a group.
Notice in both cases that duplicates are removed even if the rows they come from didn't appear to be adjacent in the database table. By default, when we use ORDER BY , results are sorted in ascending order of the column we specify (i.e., from least to greatest).
SELECT
*,
CASE WHEN EXISTS (SELECT *
FROM orders AS lookup
WHERE customer = orders.customer
AND order_date < orders.order_date
AND order_date >= DATEADD(hour, -1, order_date)
)
THEN 'Principle Order'
ELSE 'Duplicate Order'
END as Order_Status
FROM
orders
Using EXISTS
and a correlated sub-query you can check if there were any preceding orders in the last hour.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With