Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mysql efficient join of 2 tables to the same 2 tables

I have 2 tables that can be simplified to this structure:

Table 1:

+----+----------+---------------------+-------+
| id | descr_id |        date         | value |
+----+----------+---------------------+-------+
| 1  |        1 | 2013-09-20 16:39:06 |     1 |
+----+----------+---------------------+-------+
| 2  |        2 | 2013-09-20 16:44:06 |     1 |
+----+----------+---------------------+-------+
| 3  |        3 | 2013-09-20 16:49:06 |     5 |
+----+----------+---------------------+-------+
| 4  |        4 | 2013-09-20 16:44:06 |   894 |
+----+----------+---------------------+-------+

Table 2:

+----------+-------------+
| descr_id | description |
+----------+-------------+
|       1  | abc         |
+----------+-------------+
|       2  | abc         |
+----------+-------------+
|       3  | abc         |
+----------+-------------+
|       4  | DEF         |
+----------+-------------+

I want to join the description into table1, filter by description so I only get rows where description=abc, and filter out "duplicate" rows, where two rows are duplicates if they have the same value and their dates are within 6 minutes of one another. My desired output table is below, (assuming abc is the desired description filter).

+----+----------+---------------------+-------+-------------+
| id | descr_id |        date         | value | description |
+----+----------+---------------------+-------+-------------+
| 1  |        1 | 2013-09-20 16:39:06 |     1 | abc         |
+----+----------+---------------------+-------+-------------+
| 3  |        3 | 2013-09-20 16:49:06 |     5 | abc         |
+----+----------+---------------------+-------+-------------+

The query I came up with is:

select * 
  from (
        select * 
          from table1 
          join table2 using(descr_id) 
         where label='abc'
       ) t1 
  left join (
        select * 
          from table1 
          join table2 using(descr_id) 
         where label='abc'
        ) t2 on( t1.date<t2.date and t1.date + interval 6 minute > t2.date) 
 where t1.value=t2.value.

Unfortunately this query takes over a minute to run with my dataset, and returns no results (although I believe there should be results). Is there a more efficient way to perform this query? Is there a way to name a derived table and reference it later in the same query? Also, why does my query return no results?

Thanks in advance for help!

edit: I would like to retain the first of several samples with close-together timestamps.

My table1 has 6.1 million rows, my table2 has 30K, which makes me realize that table2 would only have one row for description "abc". This means that I can just query for descr_id beforehand, then use that id to avoid joining table2 in the big query at all, making it much more efficient. However, if my table2 was setup as outlined above (which would be poor database design, I admit) what is a good way to perform such a query?

like image 281
Eric Fitting Avatar asked Nov 12 '22 21:11

Eric Fitting


1 Answers

Try creating temporary tables and joining on the temporary tables:

CREATE TEMPORARY TABLE t1 AS (select * 
          FROM table1 
          JOIN table2 USING(descr_id) 
         WHERE label='abc')

CREATE TEMPORARY TABLE t2 AS (select * 
          FROM table1 
          JOIN table2 USING(descr_id) 
         WHERE label='abc')

SELECT *
FROM t1
LEFT JOIN t2 on( t1.date<t2.date and t1.date + interval 6 minute > t2.date) 
WHERE t1.value=t2.value

Temporary tables are automatically cleaned up after you disconnect from your database so there is no need to explicitly drop them.

I originally had this, but I don't believe it achieves the full requirements:

SELECT t1.id,
       t1.descr_id,
       t1.date,
       t1.value,
       t2.description
FROM table1 t1
JOIN table2 t2 ON t1.descr_id = t2.descr_id
WHERE t2.description = 'abc' 

This is essentially the same as the original query, however another option might be to create a view and join on the view like this:

CREATE VIEW v1 AS
SELECT * FROM table1 JOIN table2 USING(descr_id) WHERE label='abc'

CREATE VIEW v2 AS
SELECT * FROM table1 JOIN table2 USING(descr_id) WHERE label='abc'

SELECT *
FROM v1
LEFT JOIN v2 on( v1.date<v2.date and v1.date + interval 6 minute > v2.date) 
WHERE v1.value=v2.value

Also, if you run this query on a regular basis, you might consider loading the results from your first query into a staging table and doing your join on the staging table like this:

INSERT INTO staging
(SELECT * 
        FROM table1 
        JOIN table2 USING(descr_id) 
        WHERE label='abc')

SELECT *
    FROM staging s1
    LEFT JOIN staging s2 on( s1.date<s2.date and s1.date + interval 6 minute > s2.date) 
    WHERE s1.value=s2.value

TRUNCATE TABLE staging
like image 152
Cameron Tinker Avatar answered Nov 14 '22 21:11

Cameron Tinker