I have a problem with a query which takes far too long (Over two seconds just for this simple query).
On first look it appears to be an indexing issue, all joined fields are indexed, but i cannot find what else I may need to index to speed this up. As soon as i add the fields i need to the query, it gets even slower.
SELECT `jobs`.`job_id` AS `job_id` FROM tabledef_Jobs AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Table row counts (~): tabledef_Jobs (108k), tabledef_JobCatLink (109k), tabledef_Companies (100), tabledef_Applications (50k)
Here you can see the Describe. 'Using temporary' appears to be what is slowing down the query:
table index screenshots:
Any help would be greatly appreciated
EDIT WITH ANSWER
Final improved query with thanks to @Steve (marked answer). Ultimately, the final query was reduced from ~22s to ~0.3s:
SELECT `jobs`.`job_id` AS `job_id` FROM
(
SELECT * FROM tabledef_Jobs as jobs ORDER BY `jobs`.`date_posted` ASC LIMIT 0 , 50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
GROUP BY `jobs`.`job_id`
ORDER BY `jobs`.`date_posted` ASC
LIMIT 0 , 50
Improving performance with SQL aggregate functions The main problem with GROUP BY is that queries involving it are usually slow, especially when compared with WHERE -only queries.
performance - Mysql - LEFT JOIN way faster than INNER JOIN - Stack Overflow. Stack Overflow for Teams – Start collaborating and sharing organizational knowledge.
The most general way to satisfy a GROUP BY clause is to scan the whole table and create a new temporary table where all rows from each group are consecutive, and then use this temporary table to discover groups and apply aggregate functions (if any).
Joins: If your query joins two tables in a way that substantially increases the row count of the result set, your query is likely to be slow. There's an example of this in the subqueries lesson. Aggregations: Combining multiple rows to produce a result requires more computation than simply retrieving those rows.
Right, I’ll have a stab at this.
It would appear that the Query Optimiser cannot use an index to fulfil the query upon the tabledef_Jobs table.
You've got an offset limit and this with the combination of your ORDER BY cannot limit the amount of data before joining and thus it is having to group by job_id which is a PK and fast – but then order that data (temporary table and a filesort) before limiting and throwing away a the vast majorly of this data before finally join everything else to it.
I would suggest, adding a composite index to jobs of “job_id, date_posted”
So firstly optimise the base query:
SELECT * FROM tabledef_Jobs
GROUP BY job_id
ORDER BY date_posted
LIMIT 0,50
Then you can combine the joins and the final structure together to make a more efficient query.
I cannot let it go by without suggesting you rethink your limit offset. This is fine for small initial offsets but when it starts to get large this can be a major cause of performance issues. Let’s for example sake say you’re using this for pagination, what happens if they want page 3,000 – you will use
LIMIT 3000, 50
This will then collect 3050 rows / manipulate the data and then throw away the first 3000.
[edit 1 - In response to comments below]
I will expand with some more information that might point you in the right direction. Unfortunately there isn’t a simple fix that will resolve it , you must understand why this is happening to be able to address it. Simply removing the LIMIT or ORDER BY may not work and after all you don’t want to remove then as its part of your query which means it must be there for a purpose.
Optimise the simple base query first that is usually a lot easier than working with multi-joined datasets.
Despite all the bashing it receives there is nothing wrong with filesort. Sometimes this is the only way to execute the query. Agreed it can be the cause of many performance issues (especially on larger data sets) but that’s not usually the fault of filesort but the underlying query / indexing strategy.
Within MySQL you cannot mix indexes or mix orders of the same index – performing such a task will result in a filesort.
How about as I suggested creating an index on date_posted and then using:
SELECT jobs.job_id, jobs.date_posted, jobcats .*, apps.*, company .* FROM
(
SELECT DISTINCT job_id FROM tabledef_Jobs
ORDER BY date_posted
LIMIT 0,50
) AS jobs
LEFT JOIN tabledef_JobCatLink AS jobcats ON jobs.job_id = jobcats.job_id
LEFT JOIN tabledef_Applications AS apps ON jobs.job_id = apps.job_id
LEFT JOIN tabledef_Companies AS company ON jobs.company_id = company.company_id
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With