I have a query generated by SQLAlchemy ORM. It is supposed to retrieve stream_items for a specific course, along with all of their parts - resources, content text blocks, etc., and the users who posted them. However, this query appears to be extremely slow, taking minutes on our production database with 20,000 or so users in the database, 25 or so stream_items for the course, and a couple content text blocks per stream_item. Note that there are very few of any other records besides users in the database because we imported a bunch of users but very little content.
Edit: Note that every object id is a foreign key into the franklin_object table.
I've tried looking at the query, and have identified several troubling bits (looking at the EXPLAIN output)
However, I really don't know what to do about these, especially the latter two issues.
Here is the query:
SELECT stream_item.id AS stream_item_id,
franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
stream_item.parent_id AS stream_item_parent_id,
stream_item.shown_at AS stream_item_shown_at,
stream_item.author_id AS stream_item_author_id,
stream_item.stream_sort_at AS stream_item_stream_sort_at,
stream_item.PUBLIC AS stream_item_public,
stream_item.created_at AS stream_item_created_at,
stream_item.updated_at AS stream_item_updated_at,
anon_1.content_text_block_text AS anon_1_content_text_block_text,
anon_2.resource_id AS anon_2_resource_id,
anon_2.franklin_object_id AS anon_2_franklin_object_id,
anon_2.franklin_object_type AS anon_2_franklin_object_type,
anon_2.franklin_object_uuid AS anon_2_franklin_object_uuid,
anon_2.resource_top_parent_resource AS anon_2_resource_top_parent_resource,
anon_2.resource_top_parent_id AS anon_2_resource_top_parent_id,
anon_2.resource_title AS anon_2_resource_title,
anon_2.resource_url AS anon_2_resource_url,
anon_2.resource_image AS anon_2_resource_image,
anon_2.resource_created_at AS anon_2_resource_created_at,
anon_2.resource_updated_at AS anon_2_resource_updated_at,
franklin_object_1.id AS franklin_object_1_id,
franklin_object_1.type AS franklin_object_1_type,
franklin_object_1.uuid AS franklin_object_1_uuid,
anon_1.content_text_block_id AS anon_1_content_text_block_id,
anon_1.franklin_object_id AS anon_1_franklin_object_id,
anon_1.franklin_object_type AS anon_1_franklin_object_type,
anon_1.franklin_object_uuid AS anon_1_franklin_object_uuid,
anon_1.content_text_block_position AS anon_1_content_text_block_position,
anon_1.content_text_block_franklin_object_id AS anon_1_content_text_block_franklin_object_id,
anon_1.content_text_block_created_at AS anon_1_content_text_block_created_at,
anon_1.content_text_block_updated_at AS anon_1_content_text_block_updated_at,
anon_3.user_password AS anon_3_user_password,
anon_3.user_auth_token AS anon_3_user_auth_token,
anon_3.user_id AS anon_3_user_id,
anon_3.franklin_object_id AS anon_3_franklin_object_id,
anon_3.franklin_object_type AS anon_3_franklin_object_type,
anon_3.franklin_object_uuid AS anon_3_franklin_object_uuid,
anon_3.user_email AS anon_3_user_email,
anon_3.user_auth_token_expiration AS anon_3_user_auth_token_expiration,
anon_3.user_active AS anon_3_user_active,
anon_3.user_activation_token AS anon_3_user_activation_token,
anon_3.user_first_name AS anon_3_user_first_name,
anon_3.user_last_name AS anon_3_user_last_name,
anon_3.user_image AS anon_3_user_image,
anon_3.user_bio AS anon_3_user_bio,
anon_3.user_aspirations AS anon_3_user_aspirations,
anon_3.user_website AS anon_3_user_website,
anon_3.user_resume AS anon_3_user_resume,
anon_3.user_resume_name AS anon_3_user_resume_name,
anon_3.user_primary_role AS anon_3_user_primary_role,
anon_3.user_institution_id AS anon_3_user_institution_id,
anon_3.user_birth_date AS anon_3_user_birth_date,
anon_3.user_gender AS anon_3_user_gender,
anon_3.user_graduation_year AS anon_3_user_graduation_year,
anon_3.user_complete AS anon_3_user_complete,
anon_3.user_masthead_y_position AS anon_3_user_masthead_y_position,
anon_3.user_masthead AS anon_3_user_masthead,
anon_3.user_fb_access_token AS anon_3_user_fb_access_token,
anon_3.user_fb_user_id AS anon_3_user_fb_user_id,
anon_3.user_location AS anon_3_user_location,
anon_3.user_created_at AS anon_3_user_created_at,
anon_3.user_updated_at AS anon_3_user_updated_at,
anon_4.content_text_block_text AS anon_4_content_text_block_text,
anon_4.content_text_block_id AS anon_4_content_text_block_id,
anon_4.franklin_object_id AS anon_4_franklin_object_id,
anon_4.franklin_object_type AS anon_4_franklin_object_type,
anon_4.franklin_object_uuid AS anon_4_franklin_object_uuid,
anon_4.content_text_block_position AS anon_4_content_text_block_position,
anon_4.content_text_block_franklin_object_id AS anon_4_content_text_block_franklin_object_id,
anon_4.content_text_block_created_at AS anon_4_content_text_block_created_at,
anon_4.content_text_block_updated_at AS anon_4_content_text_block_updated_at,
anon_5.user_password AS anon_5_user_password,
anon_5.user_auth_token AS anon_5_user_auth_token,
anon_5.user_id AS anon_5_user_id,
anon_5.franklin_object_id AS anon_5_franklin_object_id,
anon_5.franklin_object_type AS anon_5_franklin_object_type,
anon_5.franklin_object_uuid AS anon_5_franklin_object_uuid,
anon_5.user_email AS anon_5_user_email,
anon_5.user_auth_token_expiration AS anon_5_user_auth_token_expiration,
anon_5.user_active AS anon_5_user_active,
anon_5.user_activation_token AS anon_5_user_activation_token,
anon_5.user_first_name AS anon_5_user_first_name,
anon_5.user_last_name AS anon_5_user_last_name,
anon_5.user_image AS anon_5_user_image,
anon_5.user_bio AS anon_5_user_bio,
anon_5.user_aspirations AS anon_5_user_aspirations,
anon_5.user_website AS anon_5_user_website,
anon_5.user_resume AS anon_5_user_resume,
anon_5.user_resume_name AS anon_5_user_resume_name,
anon_5.user_primary_role AS anon_5_user_primary_role,
anon_5.user_institution_id AS anon_5_user_institution_id,
anon_5.user_birth_date AS anon_5_user_birth_date,
anon_5.user_gender AS anon_5_user_gender,
anon_5.user_graduation_year AS anon_5_user_graduation_year,
anon_5.user_complete AS anon_5_user_complete,
anon_5.user_masthead_y_position AS anon_5_user_masthead_y_position,
anon_5.user_masthead AS anon_5_user_masthead,
anon_5.user_fb_access_token AS anon_5_user_fb_access_token,
anon_5.user_fb_user_id AS anon_5_user_fb_user_id,
anon_5.user_location AS anon_5_user_location,
anon_5.user_created_at AS anon_5_user_created_at,
anon_5.user_updated_at AS anon_5_user_updated_at,
anon_6.stream_item_id AS anon_6_stream_item_id,
anon_6.franklin_object_id AS anon_6_franklin_object_id,
anon_6.franklin_object_type AS anon_6_franklin_object_type,
anon_6.franklin_object_uuid AS anon_6_franklin_object_uuid,
anon_6.stream_item_parent_id AS anon_6_stream_item_parent_id,
anon_6.stream_item_shown_at AS anon_6_stream_item_shown_at,
anon_6.stream_item_author_id AS anon_6_stream_item_author_id,
anon_6.stream_item_stream_sort_at AS anon_6_stream_item_stream_sort_at,
anon_6.stream_item_public AS anon_6_stream_item_public,
anon_6.stream_item_created_at AS anon_6_stream_item_created_at,
anon_6.stream_item_updated_at AS anon_6_stream_item_updated_at
FROM franklin_object
INNER JOIN stream_item
ON franklin_object.id = stream_item.id
INNER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
content_text_block.id AS content_text_block_id,
content_text_block.text AS content_text_block_text,
content_text_block.position AS content_text_block_position,
content_text_block.franklin_object_id AS content_text_block_franklin_object_id,
content_text_block.created_at AS content_text_block_created_at,
content_text_block.updated_at AS content_text_block_updated_at
FROM franklin_object
INNER JOIN content_text_block
ON franklin_object.id = content_text_block.id) AS anon_1
ON stream_item.id = anon_1.content_text_block_franklin_object_id
LEFT OUTER JOIN contents_resources AS contents_resources_1
ON anon_1.content_text_block_id = contents_resources_1.content_id
LEFT OUTER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
resource.id AS resource_id,
resource.top_parent_resource AS resource_top_parent_resource,
resource.top_parent_id AS resource_top_parent_id,
resource.title AS resource_title,
resource.url AS resource_url,
resource.image AS resource_image,
resource.created_at AS resource_created_at,
resource.updated_at AS resource_updated_at
FROM franklin_object
INNER JOIN resource
ON franklin_object.id = resource.id) AS anon_2
ON anon_2.resource_id = contents_resources_1.resource_id
LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_1
ON anon_1.content_text_block_id = contents_franklin_objects_1.content_id
LEFT OUTER JOIN franklin_object AS franklin_object_1
ON franklin_object_1.id = contents_franklin_objects_1.franklin_object_id
LEFT OUTER JOIN likers AS likers_1
ON stream_item.id = likers_1.post_id
LEFT OUTER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
USER.id AS user_id,
USER.email AS user_email,
USER.password AS user_password,
USER.auth_token AS user_auth_token,
USER.auth_token_expiration AS user_auth_token_expiration,
USER.active AS user_active,
USER.activation_token AS user_activation_token,
USER.first_name AS user_first_name,
USER.last_name AS user_last_name,
USER.image AS user_image,
USER.bio AS user_bio,
USER.aspirations AS user_aspirations,
USER.website AS user_website,
USER.resume AS user_resume,
USER.resume_name AS user_resume_name,
USER.primary_role AS user_primary_role,
USER.institution_id AS user_institution_id,
USER.birth_date AS user_birth_date,
USER.gender AS user_gender,
USER.graduation_year AS user_graduation_year,
USER.complete AS user_complete,
USER.masthead_y_position AS user_masthead_y_position,
USER.masthead AS user_masthead,
USER.fb_access_token AS user_fb_access_token,
USER.fb_user_id AS user_fb_user_id,
USER.location AS user_location,
USER.created_at AS user_created_at,
USER.updated_at AS user_updated_at
FROM franklin_object
INNER JOIN USER
ON franklin_object.id = USER.id) AS anon_3
ON anon_3.user_id = likers_1.user_id
LEFT OUTER JOIN contents_franklin_objects AS contents_franklin_objects_2
ON franklin_object.id = contents_franklin_objects_2.franklin_object_id
LEFT OUTER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
content_text_block.id AS content_text_block_id,
content_text_block.text AS content_text_block_text,
content_text_block.position AS content_text_block_position,
content_text_block.franklin_object_id AS content_text_block_franklin_object_id,
content_text_block.created_at AS content_text_block_created_at,
content_text_block.updated_at AS content_text_block_updated_at
FROM franklin_object
INNER JOIN content_text_block
ON franklin_object.id = content_text_block.id) AS anon_4
ON anon_4.content_text_block_id = contents_franklin_objects_2.content_id
LEFT OUTER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
stream_item.id AS stream_item_id,
stream_item.parent_id AS stream_item_parent_id,
stream_item.shown_at AS stream_item_shown_at,
stream_item.author_id AS stream_item_author_id,
stream_item.stream_sort_at AS stream_item_stream_sort_at,
stream_item.PUBLIC AS stream_item_public,
stream_item.created_at AS stream_item_created_at,
stream_item.updated_at AS stream_item_updated_at
FROM franklin_object
INNER JOIN stream_item
ON franklin_object.id = stream_item.id) AS anon_6
ON anon_6.stream_item_parent_id = franklin_object.id
LEFT OUTER JOIN likers AS likers_2
ON anon_6.stream_item_id = likers_2.post_id
LEFT OUTER JOIN (SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
USER.id AS user_id,
USER.email AS user_email,
USER.password AS user_password,
USER.auth_token AS user_auth_token,
USER.auth_token_expiration AS user_auth_token_expiration,
USER.active AS user_active,
USER.activation_token AS user_activation_token,
USER.first_name AS user_first_name,
USER.last_name AS user_last_name,
USER.image AS user_image,
USER.bio AS user_bio,
USER.aspirations AS user_aspirations,
USER.website AS user_website,
USER.resume AS user_resume,
USER.resume_name AS user_resume_name,
USER.primary_role AS user_primary_role,
USER.institution_id AS user_institution_id,
USER.birth_date AS user_birth_date,
USER.gender AS user_gender,
USER.graduation_year AS user_graduation_year,
USER.complete AS user_complete,
USER.masthead_y_position AS user_masthead_y_position,
USER.masthead AS user_masthead,
USER.fb_access_token AS user_fb_access_token,
USER.fb_user_id AS user_fb_user_id,
USER.location AS user_location,
USER.created_at AS user_created_at,
USER.updated_at AS user_updated_at
FROM franklin_object
INNER JOIN USER
ON franklin_object.id = USER.id) AS anon_5
ON anon_5.user_id = likers_2.user_id
WHERE stream_item.parent_id = 11
ORDER BY stream_item.stream_sort_at DESC,
anon_1.content_text_block_position,
anon_6.stream_item_stream_sort_at DESC
And the EXPLAIN output:
ID SELECT_TYPE TABLE POSSIBLY_KEYS KEY KEY_LEN REF ROWS EXTRA
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 599 Using temporary; Using filesort
1 PRIMARY stream_item eq_ref PRIMARY,parent_id PRIMARY 4 anon_1.content_text_block_franklin_object_id 1 Using where
1 PRIMARY contents_resources_1 ref content_id content_id 5 anon_1.content_text_block_id 2
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 7
1 PRIMARY contents_franklin_objects_1 ref content_id content_id 5 anon_1.content_text_block_id 1
1 PRIMARY franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1 Using where
1 PRIMARY franklin_object_1 eq_ref PRIMARY PRIMARY 4 franklin.contents_franklin_objects_1.franklin_object_id 1
1 PRIMARY likers_1 ref post_id post_id 5 franklin.stream_item.id 1
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 136
1 PRIMARY contents_franklin_objects_2 ref franklin_object_id franklin_object_id 5 franklin.stream_item.id 1
1 PRIMARY <derived5> ALL NULL NULL NULL NULL 599
1 PRIMARY <derived6> ALL NULL NULL NULL NULL 608
1 PRIMARY likers_2 ref post_id post_id 5 anon_6.stream_item_id 1
1 PRIMARY <derived7> ALL NULL NULL NULL NULL 136
7 DERIVED user ALL PRIMARY NULL NULL NULL 133
7 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1
6 DERIVED stream_item ALL PRIMARY NULL NULL NULL 709
6 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.stream_item.id 1
5 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666
5 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id 1
4 DERIVED user ALL PRIMARY NULL NULL NULL 133
4 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.user.id 1
3 DERIVED resource ALL PRIMARY NULL NULL NULL 7
3 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.resource.id 1
2 DERIVED content_text_block ALL PRIMARY NULL NULL NULL 666
2 DERIVED franklin_object eq_ref PRIMARY PRIMARY 4 franklin.content_text_block.id 1
How do I reduce the ALL queries to something faster? What are other ways I can speed this up?
Is the way franklin_objects are set up an antipattern? The way it works is that the franklin_object table has two columns: id and type. Then each type is a table, with a primary key that is a foreign key into franklin_object.
The code that generates the sql is something along the lines of:
stream_item_query = StreamItem.query.options(db.joinedload('stream_items'),db.joinedload('contents_included_in'),db.joinedload('contents.resources'),db.joinedload('contents.objects'),db.subqueryload('likers'))
stream_items = stream_item_query.filter(StreamItem.parent_id == community_id).order_by(db.desc(StreamItem.stream_sort_at)).all()
Wow, this one hurt my brain a little. Trying to figure out what the query is doing, what all the tables are, and the relationships was tedious. If you have a similar experience, let that be the first hint that you're probably trying to do too much in this single query.
My suggestion is to rethink your entire approach.
SQLAlchemy is a pretty nice tool, and I'm not going to bash it (or your choice of mysql), but as with most ORM tools you need to consider the costs with their use. One example is this franklin_object
table business. Is this an anti-pattern? Yes and No. It makes sense from a purely OO perspective. You can determine which tables to query by looking up an id
in this table. From a relational querying perspective, it serves very little purpose. I could remove every instance of franklin_object
from your query and lose nothing but...the columns from franklin_object
. If that's a viable option, I would do that right away.
Let's examine this linking with franklin_object
further. Looking at the sub-queries, they all have the same form:
SELECT franklin_object.id AS franklin_object_id,
franklin_object.type AS franklin_object_type,
franklin_object.uuid AS franklin_object_uuid,
linked_table.id AS linked_table_id,
linked_table.col2 AS col2 --and more
FROM franklin_object
INNER JOIN linked_table
ON franklin_object.id = linked_table.id) AS anon_n
There isn't much information for the database to go on as far as how to optimize this part of the query, regardless of statistics. Perhaps if franklin_object
were limited by specifying the type
in a where
clause the query would be better. Maybe.
This is especially problematic with the USER table, as this table has a lot of records (so you say). Since you are querying most of the columns, and the optimizer can't accurately figure out how many rows will be retrieved, it makes sense that a full table scan be performed. In your case, twice.
Another aspect is the sheer number of joins involved. If we take out all the franklin_object
references, there are still 11 joins. That's not terrible, if your data model was more relational, but it isn't. The generated query doesn't give much help to the database to figure out the best way to perform the query, and so it doesn't do a good job. Maybe you could mitigate this with hints and so forth, but I bet this will bite you in the long-run.
You're using an ORM tool, so really use it. You don't gain anything by having such a large query done all at once. It could be split up a bit for performance. Perform lazy retrieves to avoid huge, complicated queries. I would say try, just to see how it goes, to do everything lazily. Performance will likely be ok, I'd say better. Not great, probably not even acceptable, but better than being able to get coffee while the database is churning.
Then, start piecing things together into more streamlined chunks. Tie together objects which logically make sense, such as resource
and contents_resources
. Another example, the connection between stream_item
, likers
and user
is duplicated. Make that one query and let SQLAlchemy do its thing.
As a last resort, some kind of caching mechanism could be implemented. Perhaps denormalize the tables somewhere. On a slow-changing, read-heavy system you could have these tables feed into another structure where the queries are straight-forward and fast. That is, to do the processing up-front and store it in a single table.
Good Luck
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With