Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SQL Group By Date Conflicts

I have a table with columns start_date and end_date. What we need to do is Select everything and group them by date conflicts for each Object_ID.

A date conflict is when a row's start date and/or end date pass through another rows'. For instance, here are some examples of conflicts:

Row 1 has dates 1st through the 5th, Row 2 has dates 2nd through the 3rd.

Row 1 has dates 2nd through the 5th, Row 2 has dates 1st through the 3rd.

Row 1 has dates 2nd through the 5th, Row 2 has dates 3rd through the 6th.

Row 1 has dates 2nd through the 5th, Row 2 has dates 1st through the 7th.

So for example, if we have some sample data (assume the numbers are just days of the month for simplicity):

id | object_id | start_date | end_date
1  | 1         | 1          | 5
2  | 1         | 2          | 4
3  | 1         | 6          | 8
4  | 2         | 2          | 3

What i would expect to see is this:

object_id | start_date | end_date | numconflicts
1         | <na>       | <na>     | 2
1         | 6          | 8        | 0 or null
2         | 2          | 3        | 0 or null

And for a Second Test Case, Here is some sample data:

id | object_id | start_date | end_date
1  | 1         | 1          | 5
2  | 1         | 2          | 4
3  | 1         | 6          | 8
4  | 2         | 2          | 3
5  | 2         | 4          | 5
6  | 1         | 2          | 3
7  | 1         | 10         | 12
8  | 1         | 11         | 13

And for the second Test Case, what I would expect to see as output:

object_id | start_date | end_date | numconflicts
1         | <na>       | <na>     | 3
1         | 6          | 8        | 0 or null
2         | 2          | 3        | 0 or null
2         | 4          | 5        | 0 or null
1         | <na>       | <na>     | 2

Yes, I will need some way of differentiating the first and the second grouping (the first and last rows) but I haven't quite figured that out. The goal is to view this list, and then when you click on a group of conflicts you can view all of the conflicts in that group.

My first thought was to attempt some GROUP BY CASE ... clause but I just wrapped by head around itself.

The language I am using to call mysql is php. So if someone knows of a php-loop solution rather than a large mysql query i am all ears.

Thanks in advance.

Edit: Added in primary Keys to provide a little less confusion.

Edit: Added in a Test case 2 to provide some more reasoning.

like image 211
Peanut Avatar asked Oct 04 '11 03:10

Peanut


People also ask

Can you GROUP BY date in SQL?

To group by date part, use the GROUP BY clause and the EXTRACT() function.

How do you group dates into a week in SQL?

How Do You Group Data by Week in SQL Server? SQL Server provides a function called DATEPART() , which returns a specified part (year, quarter, month, week, hour, minute, etc.) of a specified date. ORDER BY DATEPART(week, RegistrationDate);

What is GROUP BY clause in SQL with example?

In SQL, the GROUP BY clause is used to group rows by one or more columns. For example, SELECT country, COUNT(*) AS number FROM Customers GROUP BY country; Run Code. Here, the SQL command groups the rows by the country column, and counts the number of each country (because of the COUNT() function).

How to use query with GROUP BY?

The GROUP BY statement groups rows that have the same values into summary rows, like "find the number of customers in each country". The GROUP BY statement is often used with aggregate functions ( COUNT() , MAX() , MIN() , SUM() , AVG() ) to group the result-set by one or more columns.


1 Answers

This query finds the number of duplicates:

select od1.object_id, od1.start_date, od1.end_date, sum(od2.id is not null) as dups
from object_date od1
left join object_date od2
    on od2.object_id = od1.object_id
    and od2.end_date >= od1.start_date
    and od2.start_date <= od1.end_date
    and od2.id != od1.id
group by 1,2,3;

You can use this query as the basis of a query that gives you exactly what you asked for (see below for output).

select
  object_id,
  case dups when 0 then start_date else '<na>' end as start_date,
  case dups when 0 then end_date else '<na>' end as end_date,
  sum(dups) as dups
from (
  select od1.object_id, od1.start_date, od1.end_date, sum(od2.id is not null) as dups
  from object_date od1
  left join object_date od2
    on od2.object_id = od1.object_id
    and od2.end_date >= od1.start_date
    and od2.start_date <= od1.end_date
    and od2.id != od1.id
  group by 1,2,3) x
group by 1,2,3;

Note that I have used an id column to distinguish the rows. However, you could replace the test of id's not matching with comparisons on every column, ie replace od2.id != od1.id with tests that every other column is not equal, but that would require a unique index on all the other columns to make sense, and having an id column is a good idea anyway.

Here's a test using your data:

create table object_date (
    id int primary key auto_increment,
    object_id int,
    start_date int,
    end_date int
);
insert into object_date (object_id, start_date, end_date) 
    values (1,1,5),(1,2,4),(1,6,8),(2,2,3);

Output of first query when run against this sample data:

+-----------+------------+----------+------+
| object_id | start_date | end_date | dups |
+-----------+------------+----------+------+
|         1 |          1 |        5 |    1 |
|         1 |          2 |        4 |    1 |
|         1 |          6 |        8 |    0 |
|         2 |          2 |        3 |    0 |
+-----------+------------+----------+------+

Output of second query when run against this sample data:

+-----------+------------+----------+------+
| object_id | start_date | end_date | dups |
+-----------+------------+----------+------+
|         1 | 6          | 8        |    0 |
|         1 | <na>       | <na>     |    2 |
|         2 | 2          | 3        |    0 |
+-----------+------------+----------+------+
like image 186
Bohemian Avatar answered Nov 15 '22 01:11

Bohemian