I have a model like this
with the following table sizes:
+------------------+-------------+
| Table | Records |
+------------------+-------------+
| JOB | 8k |
| DOCUMENT | 150k |
| TRANSLATION_UNIT | 14,5m |
| TRANSLATION | 18,3m |
+------------------+-------------+
Now the following query
select translation.id
from "TRANSLATION" translation
inner join "TRANSLATION_UNIT" unit
on translation.fk_id_translation_unit = unit.id
inner join "DOCUMENT" document
on unit.fk_id_document = document.id
where document.fk_id_job = 11698
order by translation.id asc
limit 50 offset 0
takes about 90 seconds to finish. When I remove the ORDER BY and LIMIT clauses, it takes 19.5 seconds. ANALYZE had been run on all tables just before executing the query.
For this particular query, these are the numbers of records satisfying the criteria:
+------------------+-------------+
| Table | Records |
+------------------+-------------+
| JOB | 1 |
| DOCUMENT | 1200 |
| TRANSLATION_UNIT | 210,000 |
| TRANSLATION | 210,000 |
+------------------+-------------+
The query plan:
The query plan for the modification without ORDER BY and LIMIT is here.
Database parameters:
PostgreSQL 9.2
shared_buffers = 2048MB
effective_cache_size = 4096MB
work_mem = 32MB
Total memory: 32GB
CPU: Intel Xeon X3470 @ 2.93 GHz, 8MB cache
Can anyone see what is wrong with this query?
UPDATE: Query plan for the same query without ORDER BY (but still with the LIMIT clause).
ORDER BY is not stable.
The order of columns doesn't matter in creating tables in PostgreSQL, but it does matter sometimes in creating indexes in PostgreSQL. PostgreSQL implements primary keys with an underlying unique index.
Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.
PostgreSQL is a case-sensitive database by default, but provides various possibilities for performing case-insensitive operations and working with collations.
This is a bit too long for a comment. You are comparing apples and oranges when you remove the order by
clause. Without the order by
, the processing part of the query only needs to come up with 50 rows.
With the order by
, all the rows need to be generated before they are sorted and the top few chosen. How long does the query take if you remove the order by
and the limit
clause?
The fact that translation.id
is a primary key does not make a difference, because the processing requires going through several joins (which filter the results).
EDIT:
I wonder how this would work with a CTE to first create the table and then another to sort and fetch the results:
with CTE as (
select translation.id
from "TRANSLATION" translation
inner join "TRANSLATION_UNIT" unit
on translation.fk_id_translation_unit = unit.id
inner join "DOCUMENT" document
on unit.fk_id_document = document.id
where document.fk_id_job = 11698
)
select *
from CTE
order by translation.id asc
limit 50 offset 0;
Do you have a composite index in place on translation(fk_id_translation_unit, id)? It seems to me that that would help by avoiding the need to access the translation.id via the table.
If anyone has the same problem. It happened to me and I solved it by changing the index to ordered index. Index was extended by column ID (PK column) and direction of order.
Like that:
create index index_name on SCHEMA.TABLE (id asc, (sent_time IS NULL), some_id_ref, type);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With