Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL Subquery with User-Defined Variables

Tags:

mysql

I'm trying to accomplish a query that requires a calculated column using a subquery that passes the date reference via a variable. I'm not sure if I'm not "doing it right" but essentially the query never finishes and spins for minutes on end. This is my query:

select @groupdate:=date_format(order_date,'%Y-%m'), count(distinct customer_email) as num_cust,
(
  select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @groupdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date > @groupdate
) as prev_cust_count
from _pj_cust_email_view
group by @groupdate;

Subquery has an inner join accomplishes the self-join that only gives me the count of people that have purchased prior to the date in @groupdate. The EXPLAIN is below:

+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
| id | select_type          | table               | type | possible_keys | key       | key_len | ref                       | rows   | Extra                           |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+
|  1 | PRIMARY              | _pj_cust_email_view | ALL  | NULL          | NULL      | NULL    | NULL                      | 140147 | Using temporary; Using filesort |
|  2 | UNCACHEABLE SUBQUERY | cev                 | ALL  | IDX_EMAIL     | NULL      | NULL    | NULL                      | 140147 | Using where                     |
|  2 | UNCACHEABLE SUBQUERY | prev_purch          | ref  | IDX_EMAIL     | IDX_EMAIL | 768     | cart_A.cev.customer_email |      1 | Using where                     |
+----+----------------------+---------------------+------+---------------+-----------+---------+---------------------------+--------+---------------------------------+

And the structure of the table _pj_cust_email_view is as such:

'_pj_cust_email_view', 'CREATE TABLE `_pj_cust_email_view` (
  `order_date` varchar(10) CHARACTER SET utf8 DEFAULT NULL,
  `customer_email` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
  KEY `IDX_EMAIL` (`customer_email`),
  KEY `IDX_ORDERDATE` (`order_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'

Again, as I said earlier, I'm not really sure that this is the best way to accomplish this. Any criticism, direction is appreciated!

Update

I've made a little progress, and I'm now doing the above procedurally by iterating through all known months instead of months in the database and setting the vars ahead of time. I don't like this still. This is what I've got now:

Sets the user defined vars

set @startdate:='2010-08', @enddate:='2010-09';

Gets total distinct emails in the given range

select count(distinct customer_email) as num_cust
from _pj_cust_email_view
where order_date between @startdate and @enddate;

Gets the total count of customers who had purchased prior to the given range

select count(distinct cev.customer_email) as num_prev
  from _pj_cust_email_view cev
  inner join _pj_cust_email_view as prev_purch on (prev_purch.order_date < @startdate) and (cev.customer_email=prev_purch.customer_email)
  where cev.order_date between @startdate and @enddate;

Where @startdate is set to the start of the month and @enddate signifies the end of that month's range.

I really feel like this still can be done in one full query.

like image 403
philwinkle Avatar asked Feb 23 '11 16:02

philwinkle


People also ask

How can you set user defined values using a MySQL statement?

One way to set a user-defined variable is by issuing a SET statement: SET @var_name = expr [, @var_name = expr] ... For SET , either = or := can be used as the assignment operator.

Can we declare variable in MySQL query?

You can declare a variable using @anyVariablename which is a session variable. To create a session variable, you need to use SET command.

Can we use having clause in subquery?

Subqueries in a HAVING clause HAVING clause is used to filter groups of rows. You may place a subquery in HAVING clause in an outer query. This allows you to filter groups of rows based on the result returned by your subquery.

Is subquery allowed in WHERE clause in MySQL?

In MySQL subquery can be nested inside a SELECT, INSERT, UPDATE, DELETE, SET, or DO statement or inside another subquery. A subquery is usually added within the WHERE Clause of another SQL SELECT statement.


1 Answers

I don't think you need to use subqueries at all, nor do you need to iterate over months.

Instead, I recommend you create a table to store all months. Even if you prepopulate it with 100 years of months, it would only have 1200 rows in it, which is trivial.

CREATE TABLE Months (
    start_date DATE, 
    end_date DATE, 
    PRIMARY KEY (start_date, end_date)
);
INSERT INTO Months (start_date, end_date) 
VALUES ('2011-03-01', '2011-03-31');

Store the actual start and end dates, so you can use the DATE data type and index the two columns properly.

edit: I think I understand your requirement a bit better, and I've cleaned up this answer. The following query may be right for you:

SELECT DATE_FORMAT(m.start_date, '%Y-%m') AS month,
  COUNT(DISTINCT cev.customer_email) AS current,
  GROUP_CONCAT(DISTINCT cev.customer_email) AS current_email,
  COUNT(DISTINCT prev.customer_email) AS earlier,
  GROUP_CONCAT(DISTINCT prev.customer_email) AS earlier_email
FROM Months AS m 
LEFT OUTER JOIN _pj_cust_email_view AS cev
  ON cev.order_date BETWEEN m.start_date AND m.end_date
INNER JOIN Months AS mprev
  ON mprev.start_date <= m.start_date
LEFT OUTER JOIN _pj_cust_email_view AS prev
  ON prev.order_date BETWEEN mprev.start_date AND mprev.end_date
GROUP BY month;

If you create the following compound index in your table:

CREATE INDEX order_email on _pj_cust_email_view (order_date, customer_email);

Then the query has the best chance of being an index-only query, and will run a lot faster.

Below is the EXPLAIN optimization report from this query. Note type: index for each table.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: m
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using index; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: mprev
         type: index
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 6
          ref: NULL
         rows: 4
        Extra: Using where; Using index; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: cev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: prev
         type: index
possible_keys: order_email
          key: order_email
      key_len: 17
          ref: NULL
         rows: 10
        Extra: Using index

Here's some test data:

INSERT INTO Months (start_date, end_date) VALUES
('2011-03-01', '2011-03-31'),
('2011-02-01', '2011-02-28'),
('2011-01-01', '2011-01-31'),
('2010-12-01', '2010-12-31');

INSERT INTO _pj_cust_email_view VALUES
('ron', '2011-03-10'),
('hermione', '2011-03-15'),
('hermione', '2011-02-15'),
('hermione', '2011-01-15'),
('hermione', '2010-12-15'),
('neville', '2011-01-10'),
('harry', '2011-03-19'),
('harry', '2011-02-10'),
('molly', '2011-03-25'),
('molly', '2011-01-10');

Here's the result given that data, including the concatenated list of emails to make it easier to see.

+---------+---------+--------------------------+---------+----------------------------------+
| month   | current | current_email            | earlier | earlier_email                    |
+---------+---------+--------------------------+---------+----------------------------------+
| 2010-12 |       1 | hermione                 |       1 | hermione                         | 
| 2011-01 |       3 | neville,hermione,molly   |       3 | hermione,molly,neville           | 
| 2011-02 |       2 | hermione,harry           |       4 | harry,hermione,molly,neville     | 
| 2011-03 |       4 | molly,ron,harry,hermione |       5 | molly,ron,hermione,neville,harry | 
+---------+---------+--------------------------+---------+----------------------------------+
like image 183
Bill Karwin Avatar answered Oct 22 '22 02:10

Bill Karwin