Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find duplicates and gaps in this scenario in mysql

Tags:

select

mysql

Hi I have a table that looks like

-----------------------------------------------------------
|  id  |  group_id | source_id | target_id | sortsequence |
-----------------------------------------------------------
|  2   |    1      |    2      |   4       |     1        |   
-----------------------------------------------------------
|  4   |    1      |    20     |   2       |     1        |   
-----------------------------------------------------------
|  5   |    1      |    2      |   14      |     1        |   
-----------------------------------------------------------
|  7   |    1      |    2      |   7       |     3        |   
-----------------------------------------------------------
|  20  |    2      |    20     |   4       |     3        |   
-----------------------------------------------------------
|  21  |    2      |    20     |   4       |     1        |   
-----------------------------------------------------------

Scenario

There are two scenarios that needs to be handled.

  1. Sortsequence column value should be unique against one source_id and group_id. For example if all the records having group_id = 1 AND source_id = 2 should have sortsequence unique. In above example records having id= and 5 which are having group_id = 1 and source_id = 2 have same sortsequence which is 1. This is faulty record. I need to find out these records.
  2. If group_id and source_id is same. The sortsequence columns value should be continous. There should be no gap. For example in above table records having id = 20, 21 having same group_id and source_id and sortsequence value is 3 and 1. Even this is unique but there is a gap in sortsequence value. I need to also find out these records.

MY So Far Effort

I have written a query

SELECT source_id,`group_id`,GROUP_CONCAT(id) AS children 
FROM
    table 
GROUP BY source_id,
  sortsequence,
  `group_id` 
 HAVING COUNT(*) > 1 

This query only address the scenario 1. How to handle scenario 2? Is there any way to do it in same query or I have to write other to handle second scenario.

By the way query will be dealing with million of records in table so performance must be very good.

like image 613
Awais Qarni Avatar asked Mar 26 '13 07:03

Awais Qarni


People also ask

How to check for duplicates in MySQL data?

Now you can check for duplicates in MySQL data in one or multiple tables and understand the INNER JOIN function. Make sure you created the tables correctly and that you select the right columns. Now that you have found duplicate values, learn how to remove MySQL duplicate rows.

How do you find the gap between two numbers in SQL?

If there is a sequence having gap of maximum one between two numbers (like 1,3,5,6) then the query that can be used is: select s.id+1 from source1 s where s.id+1 not in (select id from source1) and s.id+1< (select max (id) from source1); Create a temporary table with 100 rows and a single column containing the values 1-100.

How to find duplicate values on one column of a table?

The find duplicate values in on one column of a table, you use follow these steps: First, use the GROUP BY clause to group all rows by the target column, which is the column that you want to check duplicate. Then, use the COUNT () function in the HAVING clause to check if any group have more than 1 element.

What is the impact of duplicate records in a database table?

The impact of having duplicate records in a database table can vary from a minor inconvenience to disaster. Luckily, MySQL has a few nifty keywords that can combine to scan a table for duplicates. Also, we can count the number of occurrences of duplicate records and delete them where necessary.


1 Answers

Got answer from Tere J Comments. Following query covers above mentioned both criteria.

 SELECT 
     source_id, `group_id`, GROUP_CONCAT(id) AS faultyIDS    
 FROM
     table
 GROUP BY
     source_id,group_id 
 HAVING
     COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence) OR MIN(sortsequence) <> 1

May be it can help others.

like image 90
Awais Qarni Avatar answered Sep 30 '22 20:09

Awais Qarni