Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select the highest value for a given month?

Tags:

mysql

+-------------------------------------------------+-----------------+---------------------+
| landing_page                                    | all_impressions | dates               |
+-------------------------------------------------+-----------------+---------------------+
| https://www.example.co.uk/url-1                 |           53977 | 2018-08-19 13:59:40 |
| https://www.example.co.uk/url-1                 |             610 | 2018-09-19 13:59:40 |
| https://www.example.co.uk/url-1                 |            555  | 2018-10-19 13:59:40 |
| https://www.example.co.uk/url-1                 |            23   | 2018-11-19 13:59:40 |
| https://www.example.co.uk/                      |            1000 | 2018-06-19 13:59:40 |
| https://www.example.co.uk/                      |               2 | 2018-07-19 13:59:40 |
| https://www.example.co.uk/                      |               4 | 2018-08-19 13:59:40 |
| https://www.example.co.uk/                      |            1563 | 2018-09-19 13:59:40 |
| https://www.example.co.uk/                      |               1 | 2018-10-19 13:59:40 |
| https://www.example.co.uk/                      |            9812 | 2018-11-19 13:59:40 |
+-------------------------------------------------+-----------------+---------------------+

With the above database table, I only want to select the landing_page if the impression count is the max for the current date - For example, from this, the select would return https://www.example.co.uk/ only as the current month it's all_impressions value is it's highest for November (https://www.example.co.uk/url-1  would not be selected as it's highest value was in August)

How might I do this with SQL?

index info:

mysql> show indexes from landing_pages_client_v3;
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table                   | Non_unique | Key_name     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| landing_pages_client_v3 |          0 | PRIMARY      |            1 | id          | A         |    24279939 |     NULL | NULL   |      | BTREE      |         |               |
| landing_pages_client_v3 |          1 | profile_id   |            1 | profile_id  | A         |          17 |     NULL | NULL   | YES  | BTREE      |         |               |
| landing_pages_client_v3 |          1 | profile_id   |            2 | dates       | A         |          17 |     NULL | NULL   |      | BTREE      |         |               |
| landing_pages_client_v3 |          1 | profile_id_2 |            1 | profile_id  | A         |          17 |     NULL | NULL   | YES  | BTREE      |         |               |
| landing_pages_client_v3 |          1 | profile_id_2 |            2 | lp_id       | A         |     6069984 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------------------+------------+--------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
like image 976
Adders Avatar asked Oct 16 '22 10:10

Adders


1 Answers

In a Derived Table, get the maximum value of all_impressions for every landing_page. Join back to the main table to get the row corresponding to maximum all_impressions value.

We will eventually consider that row only if it belongs to Current Month. For sargability, we will not use functions on the dates column. Instead, we will determine the first day of the current month and next month. We will consider those dates which fall within this range. You can check details of the datetime functions here: https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html

For performance, you may also need to add the following composite index: (landing_page, all_impressions, dates). (I am not sure about which order these columns should be in. Maybe some benchmarking/trial is needed.

SELECT
  t.* 
FROM 
  your_table AS t 
JOIN 
(
  SELECT
    landing_page, 
    MAX(all_impressions) AS max_all_impressions
  FROM your_table 
  GROUP BY landing_page
) AS dt 
  ON dt.landing_page = t.landing_page AND 
     dt.max_all_impressions = t.all_impressions 
WHERE 
 t.dates >= ((LAST_DAY(CURDATE()) + INTERVAL 1 DAY) - INTERVAL 1 MONTH) AND
 t.dates < (LAST_DAY(CURDATE()) + INTERVAL 1 DAY) 
like image 171
Madhur Bhaiya Avatar answered Oct 21 '22 02:10

Madhur Bhaiya