Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I optimize this query produced by SQLAlchemy?

I have a query generated by SQLAlchemy ORM. It is supposed to retrieve stream_items for a specific course, along with all of their parts - resources, content text blocks, etc., and the users who posted them. However, this query appears to be extremely slow, taking minutes on our production database with 20,000 or so users in the database, 25 or so stream_items for the course, and a couple content text blocks per stream_item. Note that there are very few of any other records besides users in the database because we imported a bunch of users but very little content.

Edit: Note that every object id is a foreign key into the franklin_object table.

I've tried looking at the query, and have identified several troubling bits (looking at the EXPLAIN output)

  1. One of the lookups is 'Using temporary; Using filesort'.
  2. The user table is hit twice with no index
  3. The content text block table is hit twice with no index

However, I really don't know what to do about these, especially the latter two issues.

Here is the query:

SELECT stream_item.id                               AS stream_item_id,
       franklin_object.id                           AS franklin_object_id,
       franklin_object.type                         AS franklin_object_type,
       franklin_object.uuid                         AS franklin_object_uuid,
       stream_item.parent_id                        AS stream_item_parent_id,
       stream_item.shown_at                         AS stream_item_shown_at,
       stream_item.author_id                        AS stream_item_author_id,
       stream_item.stream_sort_at                   AS stream_item_stream_sort_at,
       stream_item.PUBLIC                           AS stream_item_public,
       stream_item.created_at                       AS stream_item_created_at,
       stream_item.updated_at                       AS stream_item_updated_at,
       anon_1.content_text_block_text               AS anon_1_content_text_block_text,
       anon_2.resource_id                           AS anon_2_resource_id,
       anon_2.franklin_object_id                    AS anon_2_franklin_object_id,
       anon_2.franklin_object_type                  AS anon_2_franklin_object_type,
       anon_2.franklin_object_uuid                  AS anon_2_franklin_object_uuid,
       anon_2.resource_top_parent_resource          AS anon_2_resource_top_parent_resource,
       anon_2.resource_top_parent_id                AS anon_2_resource_top_parent_id,
       anon_2.resource_title                        AS anon_2_resource_title,
       anon_2.resource_url                          AS anon_2_resource_url,
       anon_2.resource_image                        AS anon_2_resource_image,
       anon_2.resource_created_at                   AS anon_2_resource_created_at,
       anon_2.resource_updated_at                   AS anon_2_resource_updated_at,
       franklin_object_1.id                         AS franklin_object_1_id,
       franklin_object_1.type                       AS franklin_object_1_type,
       franklin_object_1.uuid                       AS franklin_object_1_uuid,
       anon_1.content_text_block_id                 AS anon_1_content_text_block_id,
       anon_1.franklin_object_id                    AS anon_1_franklin_object_id,
       anon_1.franklin_object_type                  AS anon_1_franklin_object_type,
       anon_1.franklin_object_uuid                  AS anon_1_franklin_object_uuid,
       anon_1.content_text_block_position           AS anon_1_content_text_block_position,
       anon_1.content_text_block_franklin_object_id AS anon_1_content_text_block_franklin_object_id,
       anon_1.content_text_block_created_at         AS anon_1_content_text_block_created_at,
       anon_1.content_text_block_updated_at         AS anon_1_content_text_block_updated_at,
       anon_3.user_password                         AS anon_3_user_password,
       anon_3.user_auth_token                       AS anon_3_user_auth_token,
       anon_3.user_id                               AS anon_3_user_id,
       anon_3.franklin_object_id                    AS anon_3_franklin_object_id,
       anon_3.franklin_object_type                  AS anon_3_franklin_object_type,
       anon_3.franklin_object_uuid                  AS anon_3_franklin_object_uuid,
       anon_3.user_email                            AS anon_3_user_email,
       anon_3.user_auth_token_expiration            AS anon_3_user_auth_token_expiration,
       anon_3.user_active                           AS anon_3_user_active,
       anon_3.user_activation_token                 AS anon_3_user_activation_token,
       anon_3.user_first_name                       AS anon_3_user_first_name,
       anon_3.user_last_name                        AS anon_3_user_last_name,
       anon_3.user_image                            AS anon_3_user_image,
       anon_3.user_bio                              AS anon_3_user_bio,
       anon_3.user_aspirations                      AS anon_3_user_aspirations,
       anon_3.user_website                          AS anon_3_user_website,
       anon_3.user_resume                           AS anon_3_user_resume,
       anon_3.user_resume_name                      AS anon_3_user_resume_name,
       anon_3.user_primary_role                     AS anon_3_user_primary_role,
       anon_3.user_institution_id                   AS anon_3_user_institution_id,
       anon_3.user_birth_date                       AS anon_3_user_birth_date,
       anon_3.user_gender                           AS anon_3_user_gender,
       anon_3.user_graduation_year                  AS anon_3_user_graduation_year,
       anon_3.user_complete                         AS anon_3_user_complete,
       anon_3.user_masthead_y_position              AS anon_3_user_masthead_y_position,
       anon_3.user_masthead                         AS anon_3_user_masthead,
       anon_3.user_fb_access_token                  AS anon_3_user_fb_access_token,
       anon_3.user_fb_user_id                       AS anon_3_user_fb_user_id,
       anon_3.user_location                         AS anon_3_user_location,
       anon_3.user_created_at                       AS anon_3_user_created_at,
       anon_3.user_updated_at                       AS anon_3_user_updated_at,
       anon_4.content_text_block_text               AS anon_4_content_text_block_text,
       anon_4.content_text_block_id                 AS anon_4_content_text_block_id,
       anon_4.franklin_object_id                    AS anon_4_franklin_object_id,
       anon_4.franklin_object_type                  AS anon_4_franklin_object_type,
       anon_4.franklin_object_uuid                  AS anon_4_franklin_object_uuid,
       anon_4.content_text_block_position           AS anon_4_content_text_block_position,
       anon_4.content_text_block_franklin_object_id AS anon_4_content_text_block_franklin_object_id,
       anon_4.content_text_block_created_at         AS anon_4_content_text_block_created_at,
       anon_4.content_text_block_updated_at         AS anon_4_content_text_block_updated_at,
       anon_5.user_password                         AS anon_5_user_password,
       anon_5.user_auth_token                       AS anon_5_user_auth_token,
       anon_5.user_id                               AS anon_5_user_id,
       anon_5.franklin_object_id                    AS anon_5_franklin_object_id,
       anon_5.franklin_object_type                  AS anon_5_franklin_object_type,
       anon_5.franklin_object_uuid                  AS anon_5_franklin_object_uuid,
       anon_5.user_email                            AS anon_5_user_email,
       anon_5.user_auth_token_expiration            AS anon_5_user_auth_token_expiration,
       anon_5.user_active                           AS anon_5_user_active,
       anon_5.user_activation_token                 AS anon_5_user_activation_token,
       anon_5.user_first_name                       AS anon_5_user_first_name,
       anon_5.user_last_name                        AS anon_5_user_last_name,
       anon_5.user_image                            AS anon_5_user_image,
       anon_5.user_bio                              AS anon_5_user_bio,
       anon_5.user_aspirations                      AS anon_5_user_aspirations,
       anon_5.user_website                          AS anon_5_user_website,
       anon_5.user_resume                           AS anon_5_user_resume,
       anon_5.user_resume_name                      AS anon_5_user_resume_name,
       anon_5.user_primary_role                     AS anon_5_user_primary_role,
       anon_5.user_institution_id                   AS anon_5_user_institution_id,
       anon_5.user_birth_date                       AS anon_5_user_birth_date,
       anon_5.user_gender                           AS anon_5_user_gender,
       anon_5.user_graduation_year                  AS anon_5_user_graduation_year,
       anon_5.user_complete                         AS anon_5_user_complete,
       anon_5.user_masthead_y_position              AS anon_5_user_masthead_y_position,
       anon_5.user_masthead                         AS anon_5_user_masthead,
       anon_5.user_fb_access_token                  AS anon_5_user_fb_access_token,
       anon_5.user_fb_user_id                       AS anon_5_user_fb_user_id,
       anon_5.user_location                         AS anon_5_user_location,
       anon_5.user_created_at                       AS anon_5_user_created_at,
       anon_5.user_updated_at                       AS anon_5_user_updated_at,
       anon_6.stream_item_id                        AS anon_6_stream_item_id,
       anon_6.franklin_object_id                    AS anon_6_franklin_object_id,
       anon_6.franklin_object_type                  AS anon_6_franklin_object_type,
       anon_6.franklin_object_uuid                  AS anon_6_franklin_object_uuid,
       anon_6.stream_item_parent_id                 AS anon_6_stream_item_parent_id,
       anon_6.stream_item_shown_at                  AS anon_6_stream_item_shown_at,
       anon_6.stream_item_author_id                 AS anon_6_stream_item_author_id,
       anon_6.stream_item_stream_sort_at            AS anon_6_stream_item_stream_sort_at,
       anon_6.stream_item_public                    AS anon_6_stream_item_public,
       anon_6.stream_item_created_at                AS anon_6_stream_item_created_at,
       anon_6.stream_item_updated_at                AS anon_6_stream_item_updated_at
FROM   franklin_object
       INNER JOIN stream_item
               ON franklin_object.id = stream_item.id
       INNER JOIN (SELECT franklin_object.id                    AS franklin_object_id,
                          franklin_object.type                  AS franklin_object_type,
                          franklin_object.uuid                  AS franklin_object_uuid,
                          content_text_block.id                 AS content_text_block_id,
                          content_text_block.text               AS content_text_block_text,
                          content_text_block.position           AS content_text_block_position,
                          content_text_block.franklin_object_id AS content_text_block_franklin_object_id,
                          content_text_block.created_at         AS content_text_block_created_at,
                          content_text_block.updated_at         AS content_text_block_updated_at
                   FROM   franklin_object
                          INNER JOIN content_text_block
                                  ON franklin_object.id = content_text_block.id) AS anon_1
               ON stream_item.id = anon_1.content_text_block_franklin_object_id
       LEFT OUTER JOIN contents_resources AS contents_resources_1
                    ON anon_1.content_text_block_id = contents_resources_1.content_id
       LEFT OUTER JOIN (SELECT franklin_object.id           AS franklin_object_id,
                               franklin_object.type         AS franklin_object_type,
                               franklin_object.uuid         AS franklin_object_uuid,
                               resource.id                  AS resource_id,
                               resource.top_parent_resource AS resource_top_parent_resource,
                               resource.top_parent_id       AS resource_top_parent_id,
                               resource.title               AS resource_title,
                               resource.url                 AS resource_url,
                               resource.image               AS resource_image,
                               resource.created_at          AS resource_created_at,
                               resource.updated_at          AS resource_updated_at
                        FROM   franklin_object
                               INNER JOIN resource
                                       ON franklin_object.id = resource.id) AS anon_2
                    ON anon_2.resource_id = contents_resources_1.resource_id
       LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_1
                    ON anon_1.content_text_block_id = contents_franklin_objects_1.content_id
       LEFT OUTER JOIN franklin_object AS franklin_object_1
                    ON franklin_object_1.id = contents_franklin_objects_1.franklin_object_id
       LEFT OUTER JOIN likers AS likers_1
                    ON stream_item.id = likers_1.post_id
       LEFT OUTER JOIN (SELECT franklin_object.id         AS franklin_object_id,
                               franklin_object.type       AS franklin_object_type,
                               franklin_object.uuid       AS franklin_object_uuid,
                               USER.id                    AS user_id,
                               USER.email                 AS user_email,
                               USER.password              AS user_password,
                               USER.auth_token            AS user_auth_token,
                               USER.auth_token_expiration AS user_auth_token_expiration,
                               USER.active                AS user_active,
                               USER.activation_token      AS user_activation_token,
                               USER.first_name            AS user_first_name,
                               USER.last_name             AS user_last_name,
                               USER.image                 AS user_image,
                               USER.bio                   AS user_bio,
                               USER.aspirations           AS user_aspirations,
                               USER.website               AS user_website,
                               USER.resume                AS user_resume,
                               USER.resume_name           AS user_resume_name,
                               USER.primary_role          AS user_primary_role,
                               USER.institution_id        AS user_institution_id,
                               USER.birth_date            AS user_birth_date,
                               USER.gender                AS user_gender,
                               USER.graduation_year       AS user_graduation_year,
                               USER.complete              AS user_complete,
                               USER.masthead_y_position   AS user_masthead_y_position,
                               USER.masthead              AS user_masthead,
                               USER.fb_access_token       AS user_fb_access_token,
                               USER.fb_user_id            AS user_fb_user_id,
                               USER.location              AS user_location,
                               USER.created_at            AS user_created_at,
                               USER.updated_at            AS user_updated_at
                        FROM   franklin_object
                               INNER JOIN USER
                                       ON franklin_object.id = USER.id) AS anon_3
                    ON anon_3.user_id = likers_1.user_id
       LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_2
                    ON franklin_object.id = contents_franklin_objects_2.franklin_object_id
       LEFT OUTER JOIN (SELECT franklin_object.id                    AS franklin_object_id,
                               franklin_object.type                  AS franklin_object_type,
                               franklin_object.uuid                  AS franklin_object_uuid,
                               content_text_block.id                 AS content_text_block_id,
                               content_text_block.text               AS content_text_block_text,
                               content_text_block.position           AS content_text_block_position,
                               content_text_block.franklin_object_id AS content_text_block_franklin_object_id,
                               content_text_block.created_at         AS content_text_block_created_at,
                               content_text_block.updated_at         AS content_text_block_updated_at
                        FROM   franklin_object
                               INNER JOIN content_text_block
                                       ON franklin_object.id = content_text_block.id) AS anon_4
                    ON anon_4.content_text_block_id = contents_franklin_objects_2.content_id
       LEFT OUTER JOIN (SELECT franklin_object.id         AS franklin_object_id,
                               franklin_object.type       AS franklin_object_type,
                               franklin_object.uuid       AS franklin_object_uuid,
                               stream_item.id             AS stream_item_id,
                               stream_item.parent_id      AS stream_item_parent_id,
                               stream_item.shown_at       AS stream_item_shown_at,
                               stream_item.author_id      AS stream_item_author_id,
                               stream_item.stream_sort_at AS stream_item_stream_sort_at,
                               stream_item.PUBLIC         AS stream_item_public,
                               stream_item.created_at     AS stream_item_created_at,
                               stream_item.updated_at     AS stream_item_updated_at
                        FROM   franklin_object
                               INNER JOIN stream_item
                                       ON franklin_object.id = stream_item.id) AS anon_6
                    ON anon_6.stream_item_parent_id = franklin_object.id
       LEFT OUTER JOIN likers AS likers_2
                    ON anon_6.stream_item_id = likers_2.post_id
       LEFT OUTER JOIN (SELECT franklin_object.id         AS franklin_object_id,
                               franklin_object.type       AS franklin_object_type,
                               franklin_object.uuid       AS franklin_object_uuid,
                               USER.id                    AS user_id,
                               USER.email                 AS user_email,
                               USER.password              AS user_password,
                               USER.auth_token            AS user_auth_token,
                               USER.auth_token_expiration AS user_auth_token_expiration,
                               USER.active                AS user_active,
                               USER.activation_token      AS user_activation_token,
                               USER.first_name            AS user_first_name,
                               USER.last_name             AS user_last_name,
                               USER.image                 AS user_image,
                               USER.bio                   AS user_bio,
                               USER.aspirations           AS user_aspirations,
                               USER.website               AS user_website,
                               USER.resume                AS user_resume,
                               USER.resume_name           AS user_resume_name,
                               USER.primary_role          AS user_primary_role,
                               USER.institution_id        AS user_institution_id,
                               USER.birth_date            AS user_birth_date,
                               USER.gender                AS user_gender,
                               USER.graduation_year       AS user_graduation_year,
                               USER.complete              AS user_complete,
                               USER.masthead_y_position   AS user_masthead_y_position,
                               USER.masthead              AS user_masthead,
                               USER.fb_access_token       AS user_fb_access_token,
                               USER.fb_user_id            AS user_fb_user_id,
                               USER.location              AS user_location,
                               USER.created_at            AS user_created_at,
                               USER.updated_at            AS user_updated_at
                        FROM   franklin_object
                               INNER JOIN USER
                                       ON franklin_object.id = USER.id) AS anon_5
                    ON anon_5.user_id = likers_2.user_id
WHERE  stream_item.parent_id = 11
ORDER  BY stream_item.stream_sort_at DESC,
          anon_1.content_text_block_position,
          anon_6.stream_item_stream_sort_at DESC 

And the EXPLAIN output:

ID   SELECT_TYPE   TABLE    POSSIBLY_KEYS KEY KEY_LEN REF ROWS EXTRA
1   PRIMARY <derived2>  ALL NULL    NULL    NULL    NULL    599 Using     temporary; Using filesort
1   PRIMARY stream_item eq_ref  PRIMARY,parent_id   PRIMARY 4   anon_1.content_text_block_franklin_object_id    1   Using where
1   PRIMARY contents_resources_1    ref content_id  content_id  5    anon_1.content_text_block_id   2   
1   PRIMARY <derived3>  ALL NULL    NULL    NULL    NULL    7   
1   PRIMARY contents_franklin_objects_1 ref content_id  content_id  5   anon_1.content_text_block_id    1   
1   PRIMARY franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.stream_item.id 1   Using where
1   PRIMARY franklin_object_1   eq_ref  PRIMARY PRIMARY 4   franklin.contents_franklin_objects_1.franklin_object_id 1   
1   PRIMARY likers_1    ref post_id post_id 5   franklin.stream_item.id 1
1   PRIMARY <derived4>  ALL NULL    NULL    NULL    NULL    136 
1   PRIMARY contents_franklin_objects_2 ref franklin_object_id  franklin_object_id  5   franklin.stream_item.id 1   
1   PRIMARY <derived5>  ALL NULL    NULL    NULL    NULL    599 
1   PRIMARY <derived6>  ALL NULL    NULL    NULL    NULL    608 
1   PRIMARY likers_2    ref post_id post_id 5   anon_6.stream_item_id   1   
1   PRIMARY <derived7>  ALL NULL    NULL    NULL    NULL    136 
7   DERIVED user    ALL PRIMARY NULL    NULL    NULL    133 
7   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.user.id    1   
6   DERIVED stream_item ALL PRIMARY NULL    NULL    NULL    709 
6   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.stream_item.id 1   
5   DERIVED content_text_block  ALL PRIMARY NULL    NULL    NULL    666 
5   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.content_text_block.id        1 
4   DERIVED user    ALL PRIMARY NULL    NULL    NULL    133 
4   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.user.id    1   
3   DERIVED resource    ALL PRIMARY NULL    NULL    NULL    7   
3   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.resource.id    1   
2   DERIVED content_text_block  ALL PRIMARY NULL    NULL    NULL    666 
2   DERIVED franklin_object eq_ref  PRIMARY PRIMARY 4   franklin.content_text_block.id  1   

How do I reduce the ALL queries to something faster? What are other ways I can speed this up?

Is the way franklin_objects are set up an antipattern? The way it works is that the franklin_object table has two columns: id and type. Then each type is a table, with a primary key that is a foreign key into franklin_object.

The code that generates the sql is something along the lines of:

stream_item_query = StreamItem.query.options(db.joinedload('stream_items'),db.joinedload('contents_included_in'),db.joinedload('contents.resources'),db.joinedload('contents.objects'),db.subqueryload('likers'))

stream_items = stream_item_query.filter(StreamItem.parent_id == community_id).order_by(db.desc(StreamItem.stream_sort_at)).all()

like image 709
Nicholas Meyer Avatar asked Jul 16 '12 18:07

Nicholas Meyer


1 Answers

Wow, this one hurt my brain a little. Trying to figure out what the query is doing, what all the tables are, and the relationships was tedious. If you have a similar experience, let that be the first hint that you're probably trying to do too much in this single query.

My suggestion is to rethink your entire approach.

SQLAlchemy is a pretty nice tool, and I'm not going to bash it (or your choice of mysql), but as with most ORM tools you need to consider the costs with their use. One example is this franklin_object table business. Is this an anti-pattern? Yes and No. It makes sense from a purely OO perspective. You can determine which tables to query by looking up an id in this table. From a relational querying perspective, it serves very little purpose. I could remove every instance of franklin_object from your query and lose nothing but...the columns from franklin_object. If that's a viable option, I would do that right away.

Let's examine this linking with franklin_object further. Looking at the sub-queries, they all have the same form:

  SELECT franklin_object.id           AS franklin_object_id,
         franklin_object.type         AS franklin_object_type,
         franklin_object.uuid         AS franklin_object_uuid,
         linked_table.id              AS linked_table_id,
         linked_table.col2            AS col2 --and more
  FROM   franklin_object
  INNER JOIN linked_table
         ON franklin_object.id = linked_table.id) AS anon_n

There isn't much information for the database to go on as far as how to optimize this part of the query, regardless of statistics. Perhaps if franklin_object were limited by specifying the type in a where clause the query would be better. Maybe.

This is especially problematic with the USER table, as this table has a lot of records (so you say). Since you are querying most of the columns, and the optimizer can't accurately figure out how many rows will be retrieved, it makes sense that a full table scan be performed. In your case, twice.

Another aspect is the sheer number of joins involved. If we take out all the franklin_object references, there are still 11 joins. That's not terrible, if your data model was more relational, but it isn't. The generated query doesn't give much help to the database to figure out the best way to perform the query, and so it doesn't do a good job. Maybe you could mitigate this with hints and so forth, but I bet this will bite you in the long-run.

You're using an ORM tool, so really use it. You don't gain anything by having such a large query done all at once. It could be split up a bit for performance. Perform lazy retrieves to avoid huge, complicated queries. I would say try, just to see how it goes, to do everything lazily. Performance will likely be ok, I'd say better. Not great, probably not even acceptable, but better than being able to get coffee while the database is churning.

Then, start piecing things together into more streamlined chunks. Tie together objects which logically make sense, such as resource and contents_resources. Another example, the connection between stream_item, likers and user is duplicated. Make that one query and let SQLAlchemy do its thing.

As a last resort, some kind of caching mechanism could be implemented. Perhaps denormalize the tables somewhere. On a slow-changing, read-heavy system you could have these tables feed into another structure where the queries are straight-forward and fast. That is, to do the processing up-front and store it in a single table.

Good Luck

like image 55
Adam Hawkes Avatar answered Oct 05 '22 23:10

Adam Hawkes