Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PostgreSQL: Terribly slow ORDER BY with primary key as the ordering key

I have a model like this

enter image description here

with the following table sizes:

+------------------+-------------+
| Table            |    Records  |
+------------------+-------------+
| JOB              |         8k  |
| DOCUMENT         |       150k  |
| TRANSLATION_UNIT |      14,5m  |
| TRANSLATION      |      18,3m  |
+------------------+-------------+

Now the following query

select translation.id
from "TRANSLATION" translation
   inner join "TRANSLATION_UNIT" unit
     on translation.fk_id_translation_unit = unit.id
   inner join "DOCUMENT" document
     on unit.fk_id_document = document.id     
where document.fk_id_job = 11698
order by translation.id asc
limit 50 offset 0

takes about 90 seconds to finish. When I remove the ORDER BY and LIMIT clauses, it takes 19.5 seconds. ANALYZE had been run on all tables just before executing the query.

For this particular query, these are the numbers of records satisfying the criteria:

+------------------+-------------+
| Table            |     Records |
+------------------+-------------+
| JOB              |          1  |
| DOCUMENT         |       1200  |
| TRANSLATION_UNIT |    210,000  |
| TRANSLATION      |    210,000  |
+------------------+-------------+

The query plan:

enter image description here

The query plan for the modification without ORDER BY and LIMIT is here.

Database parameters:

PostgreSQL 9.2

shared_buffers = 2048MB
effective_cache_size = 4096MB
work_mem = 32MB

Total memory: 32GB
CPU: Intel Xeon X3470 @ 2.93 GHz, 8MB cache

Can anyone see what is wrong with this query?

UPDATE: Query plan for the same query without ORDER BY (but still with the LIMIT clause).

like image 619
twoflower Avatar asked Nov 04 '13 14:11

twoflower


People also ask

Is Postgres ORDER BY stable?

ORDER BY is not stable.

Does order matter in Postgres index?

The order of columns doesn't matter in creating tables in PostgreSQL, but it does matter sometimes in creating indexes in PostgreSQL. PostgreSQL implements primary keys with an underlying unique index.

How make PostgreSQL query run faster?

Some of the tricks we used to speed up SELECT-s in PostgreSQL: LEFT JOIN with redundant conditions, VALUES, extended statistics, primary key type conversion, CLUSTER, pg_hint_plan + bonus. Photo by Richard Jacobs on Unsplash.

Is Postgres ORDER BY case insensitive?

PostgreSQL is a case-sensitive database by default, but provides various possibilities for performing case-insensitive operations and working with collations.


3 Answers

This is a bit too long for a comment. You are comparing apples and oranges when you remove the order by clause. Without the order by, the processing part of the query only needs to come up with 50 rows.

With the order by, all the rows need to be generated before they are sorted and the top few chosen. How long does the query take if you remove the order by and the limit clause?

The fact that translation.id is a primary key does not make a difference, because the processing requires going through several joins (which filter the results).

EDIT:

I wonder how this would work with a CTE to first create the table and then another to sort and fetch the results:

with CTE as (
     select translation.id
     from "TRANSLATION" translation
          inner join "TRANSLATION_UNIT" unit
          on translation.fk_id_translation_unit = unit.id
          inner join "DOCUMENT" document
          on unit.fk_id_document = document.id     
     where document.fk_id_job = 11698
    )
select *
from CTE
order by translation.id asc
limit 50 offset 0;
like image 61
Gordon Linoff Avatar answered Oct 23 '22 13:10

Gordon Linoff


Do you have a composite index in place on translation(fk_id_translation_unit, id)? It seems to me that that would help by avoiding the need to access the translation.id via the table.

like image 1
David Aldridge Avatar answered Oct 23 '22 13:10

David Aldridge


If anyone has the same problem. It happened to me and I solved it by changing the index to ordered index. Index was extended by column ID (PK column) and direction of order.

Like that:

create index index_name on SCHEMA.TABLE (id asc, (sent_time IS NULL), some_id_ref, type);
like image 1
Bartek K Avatar answered Oct 23 '22 13:10

Bartek K