Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL date diff iteration query - streamline query or optimize data structure

By way of introduction...
I've run into this question: Difference between 2 adjacent fields - Date - PHP MYSQL and was trying to achieve the goal, i.e iterate through dates and get diff, with pure MySQL.
Another question there (Subtracting one row of data from another in SQL) helped me to understand how to make something similar with MySQL. It did not solve the problem, as the solutions are still depandant either on fixed values or on assumed order of data, but it did help me to understand the methodology.
There is one other question (How to get next/previous record in MySQL?) with answers describing how to get values from next/previous row. It's still dependand on some fixed values, but I learned how to use the technique.

Say I have this table foo:

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dateof` date NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
  id | dateof
-----+------------
   1 | 2012-01-01
   2 | 2012-01-02
  11 | 2012-01-04
  12 | 2012-01-01
  13 | 2012-01-02
  14 | 2012-01-09
 111 | 2012-01-01
 112 | 2012-01-01
 113 | 2012-01-01

There are two assumptions:

  1. Primary key (id) ordered ascending and "holes" allowed.
  2. Every date in dateof column is valid, in the meaning: no NULLs and no defaults (0000-00-00). I want to iterate through every row and calculate number of days passed with previous entry, to receive this:
  id | date       | days_diff
-----+------------+-----------
   1 | 2012-01-01 |     0
   2 | 2012-01-02 |     1
  11 | 2012-01-04 |     2
  12 | 2012-01-01 |    -3
  13 | 2012-01-02 |     1
  14 | 2012-01-09 |     7
 111 | 2012-01-01 |    -8
 112 | 2012-01-01 |     0
 113 | 2012-01-01 |    30

With all I have learned I came to this solution (say solution 1, as there is another):

SELECT
    f.id,
    DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
    (SELECT DATEDIFF(f.dateof, f2.dateof)
        FROM foo f2
        WHERE f2.id = (
            SELECT MAX(f3.id) FROM foo f3 WHERE f3.id < f.id
        )
    ) AS days_diff
FROM foo f;

(example fiddle here: http://sqlfiddle.com/#!2/099fc/3).

This works like a charm... until there are just a couple of entries in db. It gets worse when more:

EXPLAIN:
id select_type        table type   possible_keys key     key_len ref    rows  Extra
1  PRIMARY            f     ALL    NULL          NULL    NULL    NULL   17221   
2  DEPENDENT SUBQUERY f2    eq_ref PRIMARY       PRIMARY 4       func   1     Using where
3  DEPENDENT SUBQUERY f3    index  PRIMARY       PRIMARY 4       NULL   17221 Using where; Using index

18031 rows: duration: 8.672 sec. Fetch: 228.515 sec.

I thought of adding index on dateof column:

CREATE TABLE `foo` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `dateof` date DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `dateof` (`dateof`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

...and gained tiny improvement:

EXPLAIN:
id select_type        table type   possible_keys key     key_len ref  rows  Extra
1  PRIMARY            f     index  NULL          dateof  4       NULL 18369 Using index
2  DEPENDENT SUBQUERY f2    eq_ref PRIMARY       PRIMARY 4       func 1     Using where
3  DEPENDENT SUBQUERY f3    index  PRIMARY       dateof  4       NULL 18369 Using where; Using index

18031 rows: duration: 8.406 sec. Fetch: 219.281 sec.

I recalled reading somewhere about advantages of MyISAM over InnoDB, in some cases. So I changed the to MyISAM:

ALTER TABLE `foo` ENGINE = MyISAM;

18031 rows: duration: 5.671 sec. Fetch: 151.610 sec.

Sure it's better but still slow.

I tried with another algorithm (solution 2):

SELECT
  f.id,
  DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
  (SELECT DATEDIFF(f.dateof, f2.dateof)
    FROM foo f2
    WHERE f2.id < f.id
    ORDER BY f2.id DESC
    LIMIT 1
  ) AS days_diff
FROM foo f;

...but it was even slower:

18031 rows: duration: 15.609 sec. Fetch: 184.656 sec.


Are there any other ways to optimize this query or data structure in order to have this task performed faster?

like image 477
Taz Avatar asked Apr 25 '12 17:04

Taz


1 Answers

It is no surprise that your approaches are very slow even for a moderately sized table.

It should theoretically be possible to calculate the result in O(n) time using the LAG analytical function, which unfortunately is not supported in MySQL. However you can emulate LAG in MySQL using variables:

SELECT
    id,
    DATE_FORMAT(f.dateof, '%b %e, %Y') AS date,
    DATEDIFF(dateof, @prev) AS days_diff,
    @prev := dateof
FROM FOO, (SELECT @prev := NULL) AS vars
ORDER BY id

This should be several orders of magnitude faster than what you are trying to do.

like image 147
Mark Byers Avatar answered Oct 18 '22 13:10

Mark Byers