Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Optimize long query in mysql in a tremendous table size 33M rows

Tags:

sql

mysql

The query:

SELECT users.id as uid, name, avatar, avatar_date, driver, messages.id AS mid,messages.msg, messages.removed, messages.from_anonym_id, messages.t
    o_anonym_id, (messages.date DIV 1000) AS date, from_id = 162077 as outbox, !(0 in (SELECT read_state FROM messages as msgs 
WHERE (msgs.from_id = messages.from_id or msgs.from_id = messages.user_id) and msgs.user_id = 162077 and removed = 0)) as read_state 
FROM dialog, messages, users 
WHERE messages.id = dialog.mid and ((uid1 = 162077 and users.id = uid2) or (uid2 = 162077 and users.id = uid1) ) 
ORDER BY dialog.mid DESC LIMIT 0, 101;

Tables structure:

mysql> desc messages;
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| id             | int(11)          | NO   | PRI | NULL    | auto_increment |
| from_id        | int(11)          | NO   | MUL | NULL    |                |
| user_id        | int(11)          | NO   | MUL | NULL    |                |
| group_id       | int(11)          | NO   |     | NULL    |                |
| to_number      | varchar(30)      | NO   | MUL | NULL    |                |
| msg            | text             | NO   |     | NULL    |                |
| image          | varchar(20)      | NO   |     | NULL    |                |
| date           | bigint(20)       | NO   |     | NULL    |                |
| read_state     | tinyint(1)       | NO   |     | 0       |                |
| removed        | tinyint(1)       | NO   | MUL | NULL    |                |
| from_anonym_id | int(10) unsigned | NO   | MUL | NULL    |                |
| to_anonym_id   | int(10) unsigned | NO   | MUL | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+

mysql> desc dialog;
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| id             | int(11)          | NO   | PRI | NULL    | auto_increment |
| uid1           | int(11)          | NO   | MUL | NULL    |                |
| uid2           | int(11)          | NO   | MUL | NULL    |                |
| mid            | int(11)          | NO   | MUL | NULL    |                |
| from_anonym_id | int(10) unsigned | NO   | MUL | NULL    |                |
| to_anonym_id   | int(10) unsigned | NO   | MUL | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+


mysql> show index from messages;
+----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name       | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| messages |          0 | PRIMARY        |            1 | id             | A         |    42944290 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2      |            1 | user_id        | A         |     2147214 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2      |            2 | read_state     | A         |     2862952 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2      |            3 | removed        | A         |     2862952 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | from_id        |            1 | from_id        | A         |      825851 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | from_id        |            2 | to_number      | A         |      825851 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | to_number      |            1 | to_number      | A         |          29 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | idx_user_id    |            1 | user_id        | A         |     2044966 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | idx_from_id    |            1 | from_id        | A         |      447336 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | removed        |            1 | removed        | A         |          29 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | from_anonym_id |            1 | from_anonym_id | A         |          29 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | to_anonym_id   |            1 | to_anonym_id   | A         |          29 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
12 rows in set (0.01 sec)

mysql> show index from dialog;
+--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name       | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| dialog |          0 | PRIMARY        |            1 | id             | A         |     6378161 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | uid1           |            1 | uid1           | A         |      455582 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | uid1           |            2 | uid2           | A         |     6378161 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | uid2           |            1 | uid2           | A         |     2126053 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | idx_mid        |            1 | mid            | A         |     6378161 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | from_anonym_id |            1 | from_anonym_id | A         |          17 |     NULL | NULL   |      | BTREE      |         |               |
| dialog |          1 | to_anonym_id   |            1 | to_anonym_id   | A         |          17 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

PS please do not advise me any theoretical recipe, only practical examples. Thx in advance.

If I remove this statement

!(0 in (SELECT read_state FROM messages as msgs 
WHERE (msgs.from_id = messages.from_id or msgs.from_id = messages.user_id) and msgs.user_id = 162077 and removed = 0)) as read_state

query works very well in comparison to original: 101 rows in set (0.04 sec)

I suppose this is the main issue, but I need this field out there. May be someone can turn this round and make it faster, would be very pleased.

like image 484
Arthur Kushman Avatar asked Sep 05 '14 16:09

Arthur Kushman


2 Answers

This is your query with the join syntax fixed and table aliases added for the tables in the outer query:

SELECT u.id as uid, name, avatar, avatar_date, driver, m.id AS mid, m.msg,
       m.removed, m.from_anonym_id, m.t
       o_anonym_id, (m.date DIV 1000) AS date, from_id = 162077 as outbox,
       !(0 in (SELECT read_state
               FROM messages m2 
               WHERE (m2.from_id = m.from_id or m2.from_id = m.user_id) and
                     m2.user_id = 162077 and removed = 0
              )
        ) as read_state 
FROM dialog d join
     messages m
     on m.id = d.mid join
     users u
     on (uid1 = 162077 and users.id = uid2) or
        (uid2 = 162077 and users.id = uid1)
ORDER BY d.mid DESC
LIMIT 0, 101;

If the query works well without the subquery in the select clause, I would recommend replacing that. in can be an expensive operator, particularly with or on the conditions. So I would recommend replacing it with:

 (case when exists (select 1
                    from messages m2
                    where m2.user_id = 162077 and m2.removed = 0 and
                          m2.from_id = m.from_id and m2.read_state = 0
                   )
       then 0
       when exists (select 1
                    from messages m2
                    where m2.user_id = 162077 and m2.removed = 0 and
                          m2.from_id = m.user_id and m2.read_state = 0
                   )
       then 0
       else 1
   end)

And, you want an index on messages(from_id, user_id, removed, read_state).

like image 159
Gordon Linoff Avatar answered Sep 19 '22 14:09

Gordon Linoff


I would start with an index on the messages table. A compound index to help cover a join as I have in the sample query below... Index on ( user_id, removed, read_state, from_id ).

Next, explanation of my process. I am doing a preliminary query from the dialogs table as a UNION, but of each respectively grabbing the opposite ID for the "LinkToUser" for next cycle linking to user table once vs an "OR" join result as you had in the where clause. Getting qualified records up-front and simplified might help you out.

Next part is where the index will come in for your messages. I am doing a left-join based on the specific user, removed = 0 and SPECIFICALLY the read_state = 0. By using the index, it will either find a match or it wont. So your Selected field clause of ( ! 0 in ... ) is just simplified to an IS NULL check.

SELECT 
      u.id as uid, 
      u.name, 
      avatar, 
      avatar_date, 
      driver, 
      m.id AS mid,
      m.msg, 
      m.removed, 
      m.from_anonym_id, 
      m.to_anonym_id, 
      (m.date DIV 1000) AS date, 
      from_id = 162077 as outbox, 
      msgFrom.from_id IS NULL  as read_state
   FROM 
      ( select distinct d1.*, d1.uid2 as LinkToUser
           from dialog d1
           where d1.uid1 = 162077
        union select d2.*, d2.uid1 as LinkToUser
           from dialog d2
           where d2.uid2 = 162077 ) Qualified

         JOIN Users u
            ON Qualified.LinkToUser = u.id

         JOIN Messages m
            ON Qualified.mid = m.id

            LEFT JOIN Messages msgFrom
               ON msgFrom.user_id = 160277
               AND msgFrom.Removed = 0
               AND msgFrom.Read_State = 0
               AND (  m.from_id = msgFrom.from_id 
                   OR m.user_id = msgFrom.from_id )

   ORDER BY 
      Qualified.mid DESC 
   LIMIT 
      0, 101;

you may need to play with it a bit, maybe change to something like..

if( msgFrom.from_id IS NULL, 0, msgFrom.read_state ) as Read_State

CLARIFICATION

Zeusakm, your individual field for the read_state as written will ONLY return a 1 or 0 as it is a logical condition of NOT a value of zero in a selected list of messages. It will never return a -1 as you indicated in your comment. My version does the same thing. If it DOES find a zero, return zero.. if it can not find a zero, it returns 1 as the compare value would be NULL and thus a "IsThisValue IS NULL" returns true which is the same as a flag of 1.

So, hopefully that clarifies what I was doing with the left-join for you. Explicitly look for the userID, removed state and read state and (from or user id match).

like image 32
DRapp Avatar answered Sep 19 '22 14:09

DRapp