Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP/MYSQL datetime ranges overlapping for users

please I need help with this (for better understanding please see attached image) because I am completely helpless.

http://img16.imageshack.us/img16/7196/overlapsen.jpg

As you can see I have users and they store their starting and ending datetimes in my DB as YYYY-mm-dd H:i:s. Now I need to find out overlaps for all users according to the most frequent time range overlaps (for most users). I would like to get 3 most frequented datatime overlaps for most users. How can I do it?

I have no idea which mysql query should I use or maybe it would be better to select all datetimes (start and end) from database and process it in php (but how?). As stated on image results should be for example time 8.30 - 10.00 is result for users A+B+C+D.

Table structure:
UserID | Start datetime | End datetime
--------------------------------------
A | 2012-04-03 4:00:00 | 2012-04-03 10:00:00
A | 2012-04-03 16:00:00 | 2012-04-03 20:00:00
B | 2012-04-03 8:30:00 | 2012-04-03 14:00:00
B | 2012-04-06 21:30:00 | 2012-04-06 23:00:00
C | 2012-04-03 12:00:00 | 2012-04-03 13:00:00
D | 2012-04-01 01:00:01 | 2012-04-05 12:00:59
E | 2012-04-03 8:30:00 | 2012-04-03 11:00:00
E | 2012-04-03 21:00:00 | 2012-04-03 23:00:00
like image 218
peter Avatar asked Apr 01 '12 10:04

peter


1 Answers

What you effectively have is a collection of sets and want to determine if any of them have non-zero intersections. This is the exact question one asks when trying to find all the ancestors of a node in a nested set.

We can prove that for every overlap, at least one time window will have a start time that falls within all other overlapping time windows. Using this tidbit, we don't need to actually construct artificial timeslots in the day. Simply take a start time and see if it intersects any of the other time windows and then just count up the number of intersections.

So what's the query?

/*SELECT*/
SELECT DISTINCT
    MAX(overlapping_windows.start_time) AS overlap_start_time,
    MIN(overlapping_windows.end_time) AS overlap_end_time ,
    (COUNT(overlapping_windows.id) - 1) AS num_overlaps
FROM user_times AS windows
INNER JOIN user_times AS overlapping_windows
ON windows.start_time BETWEEN overlapping_windows.start_time AND overlapping_windows.end_time
GROUP BY windows.id
ORDER BY num_overlaps DESC;

Depending on your table size and how often you plan on running this query, it might be worthwhile to drop a spatial index on it (see below).

UPDATE

If your running this query often, you'll need to use a spatial index. Because of range based traversal (ie. does start_time fall in between the range of start/end), a BTREE index will not do anything for you. IT HAS TO BE SPATIAL.

ALTER TABLE user_times ADD COLUMN time_windows GEOMETRY NOT NULL DEFAULT 0;
UPDATE user_times SET time_windows = GeomFromText(CONCAT('LineString( -1 ', start_time, ', 1 ', end_time, ')'));
CREATE SPATIAL INDEX time_window ON user_times (time_window);

Then you can update the ON clause in the above query to read

ON MBRWithin( Point(0,windows.start_time), overlapping_windows.time_window )

This will get you an indexed traversal for the query. Again only do this if your planning on running the query often.

Credit for the spatial index to Quassoni's blog.

like image 200
tazer84 Avatar answered Oct 12 '22 09:10

tazer84