Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize query with many joins?

I have simple but long query which count the content of the result it takes about 14 seconds. the count itself on the main table takes less than a second but after multiple join the delay is too high as follow

Select  Count(Distinct visits.id) As Count_id
    From  visits
    Left Join  clients_locations  ON visits.client_location_id = clients_locations.id
    Left Join  clients  ON clients_locations.client_id = clients.id
    Left Join  locations  ON clients_locations.location_id = locations.id
    Left Join  users  ON visits.user_id = users.id
    Left Join  potentialities  ON clients_locations.potentiality = potentialities.id
    Left Join  classes  ON clients_locations.class = classes.id
    Left Join  professions  ON clients.profession_id = professions.id
    Inner Join  specialties  ON clients.specialty_id = specialties.id
    Left Join  districts  ON locations.district_id = districts.id
    Left Join  provinces  ON districts.province_id = provinces.id
    Left Join  locations_types  ON locations.location_type_id = locations_types.id
    Left Join  areas  ON clients_locations.area_id = areas.id
    Left Join  calls  ON calls.visit_id = visits.id 

The output of explain is

+---+---+---+---+---+---+---+---+---+---+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+---+---+---+---+---+---+---+---+---+---+
| 1 | SIMPLE | specialties | index | PRIMARY | specialty_name | 52 | NULL | 53 | Using index |
| 1 | SIMPLE | clients | ref | PRIMARY,specialty | specialty | 4 | crm_db.specialties.id | 143 |  |
| 1 | SIMPLE | clients_locations | ref | PRIMARY,client_id | client_id | 4 | crm_db.clients.id | 1 |  |
| 1 | SIMPLE | locations | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.clients_locations.location_id | 1 |  |
| 1 | SIMPLE | districts | eq_ref | PRIMARY | PRIMARY | 4 | crm_db.locations.district_id | 1 | Using where |
| 1 | SIMPLE | visits | ref | unique_visit,client_location_id | unique_visit | 4 | crm_db.clients_locations.id | 4 | Using index |
| 1 | SIMPLE | calls | ref | call_unique,visit_id | call_unique | 4 | crm_db.visits.id | 1 | Using index |
+---+---+---+---+---+---+---+---+---+---+

Update 1 The above query used with dynamic where statement $sql = $sql . "Where ". $whereFilter but the i submitted it in simple form . So do not consider the answer just eleminate the joins :)

Update 2 Here is example of dynamic filtering

$temp = $this->province_id;
if ($temp != null) {
        $whereFilter = $whereFilter . " and provinces.id In ($temp) ";
    }

But in startup case which is our case no where statement

like image 850
Eslam Sameh Ahmed Avatar asked Aug 12 '16 17:08

Eslam Sameh Ahmed


1 Answers

Left joins always return a row from the first table, but may return multiple rows if there are multiple matching rows. But because you are counting distinct visit rows, left joining to another table while counting distinct visits is the same as just counting the rows of visits. Thus the only joins that affect the result are inner joins, so you can remove all "completely" left joined tables without affecting the result.

What I mean by "completely" is that some left joined tables are effectively inner joined; the inner join to specialty requires the join to clients to succeed and thus also be an inner join, which in turn requires the join to clients_locations to succeed and thus also be an inner join.

Your query (as posted) can be reduced to:

Select Count(Distinct visits.id) As Count_id
From visits
Join clients_locations ON visits.client_location_id = clients_locations.id
Join clients ON clients_locations.client_id = clients.id
Join specialties ON clients.specialty_id = specialties.id

Removing all those unnecessary joins will however greatly improve the runtime of your query, not only because there are less joins to make but also because the resulting rowset size could be enormous when you consider that the size is the product of the matches in all the tables (not the sum.

For maximum performance, create a covering indexes on all id-and-fk columns:

create index visits_id_client_location_id on visits(id, client_location_id);
create index clients_locations_id_client_id on clients_locations(id, client_id);
create index clients_id_specialty_id on clients(id, specialty_id);

so index-only scans can be used where possible. I assume there are indexes on the PK columns.

like image 124
Bohemian Avatar answered Oct 20 '22 15:10

Bohemian