+-------------------------------------------------+-----------------+---------------------+
| landing_page | all_impressions | dates |
+-------------------------------------------------+-----------------+---------------------+
| https://www.example.co.uk/url-1 | 53977 | 2018-08-19 13:59:40 |
| https://www.example.co.uk/url-1 | 610 | 2018-09-19 13:59:40 |
| https://www.example.co.uk/url-1 | 555 | 2018-10-19 13:59:40 |
| https://www.example.co.uk/url-1 | 23 | 2018-11-19 13:59:40 |
| https://www.example.co.uk/ | 1000 | 2018-06-19 13:59:40 |
| https://www.example.co.uk/ | 2 | 2018-07-19 13:59:40 |
| https://www.example.co.uk/ | 4 | 2018-08-19 13:59:40 |
| https://www.example.co.uk/ | 1563 | 2018-09-19 13:59:40 |
| https://www.example.co.uk/ | 1 | 2018-10-19 13:59:40 |
| https://www.example.co.uk/ | 9812 | 2018-11-19 13:59:40 |
+-------------------------------------------------+-----------------+---------------------+
With the above database table, I only want to select the landing_page if the impression count is the max for the current date - For example, from this, the select would return https://www.example.co.uk/ only as the current month it's all_impressions value is it's highest for November (https://www.example.co.uk/url-1 would not be selected as it's highest value was in August)
How might I do this with SQL?
index info:
mysql> show indexes from landing_pages_client_v3;
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| landing_pages_client_v3 | 0 | PRIMARY | 1 | id | A | 24279939 | NULL | NULL | | BTREE | | |
| landing_pages_client_v3 | 1 | profile_id | 1 | profile_id | A | 17 | NULL | NULL | YES | BTREE | | |
| landing_pages_client_v3 | 1 | profile_id | 2 | dates | A | 17 | NULL | NULL | | BTREE | | |
| landing_pages_client_v3 | 1 | profile_id_2 | 1 | profile_id | A | 17 | NULL | NULL | YES | BTREE | | |
| landing_pages_client_v3 | 1 | profile_id_2 | 2 | lp_id | A | 6069984 | NULL | NULL | YES | BTREE | | |
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
In a Derived Table, get the maximum value of all_impressions
for every landing_page
. Join back to the main table to get the row corresponding to maximum all_impressions
value.
We will eventually consider that row only if it belongs to Current Month. For sargability, we will not use functions on the dates
column. Instead, we will determine the first day of the current month and next month. We will consider those dates
which fall within this range. You can check details of the datetime functions here: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
For performance, you may also need to add the following composite index: (landing_page, all_impressions, dates)
. (I am not sure about which order these columns should be in. Maybe some benchmarking/trial is needed.
SELECT
t.*
FROM
your_table AS t
JOIN
(
SELECT
landing_page,
MAX(all_impressions) AS max_all_impressions
FROM your_table
GROUP BY landing_page
) AS dt
ON dt.landing_page = t.landing_page AND
dt.max_all_impressions = t.all_impressions
WHERE
t.dates >= ((LAST_DAY(CURDATE()) + INTERVAL 1 DAY) - INTERVAL 1 MONTH) AND
t.dates < (LAST_DAY(CURDATE()) + INTERVAL 1 DAY)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With