PostgreSQL: The lower the LIMIT, the slower the query

Tags:

I have the following query

SELECT translation.id
FROM "TRANSLATION" translation
   INNER JOIN "UNIT" unit
     ON translation.fk_id_unit = unit.id
   INNER JOIN "DOCUMENT" document
     ON unit.fk_id_document = document.id
WHERE document.fk_id_job = 3665
ORDER BY translation.id asc
LIMIT 50

It runs for dreadful 110 seconds.

The table sizes:

+----------------+-------------+
| Table          | Records     |
+----------------+-------------+
| TRANSLATION    |  6,906,679  |
| UNIT           |  6,906,679  |
| DOCUMENT       |     42,321  |
+----------------+-------------+

However, when I change the LIMIT parameter from 50 to 1000, the query finishes in 2 seconds.

Here is the query plan for the slow one

Limit (cost=0.00..146071.52 rows=50 width=8) (actual time=111916.180..111917.626 rows=50 loops=1)
  ->  Nested Loop (cost=0.00..50748166.14 rows=17371 width=8) (actual time=111916.179..111917.624 rows=50 loops=1)
      Join Filter: (unit.fk_id_document = document.id)
    ->  Nested Loop (cost=0.00..39720545.91 rows=5655119 width=16) (actual time=0.051..15292.943 rows=5624514 loops=1)
          ->  Index Scan using "TRANSLATION_pkey" on "TRANSLATION" translation (cost=0.00..7052806.78 rows=5655119 width=16) (actual time=0.039..1887.757 rows=5624514 loops=1)
          ->  Index Scan using "UNIT_pkey" on "UNIT" unit (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=5624514)
              Index Cond: (unit.id = translation.fk_id_translation_unit)
    ->  Materialize  (cost=0.00..138.51 rows=130 width=8) (actual time=0.000..0.006 rows=119 loops=5624514)
          ->  Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.025..0.184 rows=119 loops=1)
              Index Cond: (fk_id_job = 3665)

and for the fast one

Limit (cost=523198.17..523200.67 rows=1000 width=8) (actual time=2274.830..2274.988 rows=1000 loops=1)
  ->  Sort (cost=523198.17..523241.60 rows=17371 width=8) (actual time=2274.829..2274.895 rows=1000 loops=1)
      Sort Key: translation.id
      Sort Method:  top-N heapsort  Memory: 95kB
      ->  Nested Loop (cost=139.48..522245.74 rows=17371 width=8) (actual time=0.095..2252.710 rows=97915 loops=1)
          ->  Hash Join (cost=139.48..420861.93 rows=17551 width=8) (actual time=0.079..2005.238 rows=97915 loops=1)
              Hash Cond: (unit.fk_id_document = document.id)
              ->  Seq Scan on "UNIT" unit  (cost=0.00..399120.41 rows=5713741 width=16) (actual time=0.008..1200.547 rows=6908070 loops=1)
              ->  Hash (cost=137.86..137.86 rows=130 width=8) (actual time=0.065..0.065 rows=119 loops=1)
                  Buckets: 1024  Batches: 1  Memory Usage: 5kB
                  ->  Index Scan using "DOCUMENT_idx_job" on "DOCUMENT" document (cost=0.00..137.86 rows=130 width=8) (actual time=0.009..0.041 rows=119 loops=1)
                      Index Cond: (fk_id_job = 3665)
          ->  Index Scan using "TRANSLATION_idx_unit" on "TRANSLATION" translation (cost=0.00..5.76 rows=1 width=16) (actual time=0.002..0.002 rows=1 loops=97915)
              Index Cond: (translation.fk_id_translation_unit = unit.id)

Apparently the execution plans are very different and the second one results in a query 50 times faster.

I have indexes on all the fields involved in the query and I ran ANALYZE on all tables just before running the queries.

Can somebody see what is wrong with the first query?

UPDATE: Table definitions

CREATE TABLE "public"."TRANSLATION" (
  "id" BIGINT NOT NULL, 
  "fk_id_translation_unit" BIGINT NOT NULL, 
  "translation" TEXT NOT NULL, 
  "fk_id_language" INTEGER NOT NULL, 
  "relevance" INTEGER, 
  CONSTRAINT "TRANSLATION_pkey" PRIMARY KEY("id"), 
  CONSTRAINT "TRANSLATION_fk" FOREIGN KEY ("fk_id_translation_unit")
    REFERENCES "public"."UNIT"("id")
    ON DELETE CASCADE
    ON UPDATE NO ACTION
    DEFERRABLE
    INITIALLY DEFERRED, 
  CONSTRAINT "TRANSLATION_fk1" FOREIGN KEY ("fk_id_language")
    REFERENCES "public"."LANGUAGE"("id")
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
    NOT DEFERRABLE
) WITHOUT OIDS;

CREATE INDEX "TRANSLATION_idx_unit" ON "public"."TRANSLATION"
  USING btree ("fk_id_translation_unit");

CREATE INDEX "TRANSLATION_language_idx" ON "public"."TRANSLATION"
  USING hash ("translation");

CREATE TABLE "public"."UNIT" (
  "id" BIGINT NOT NULL, 
  "text" TEXT NOT NULL, 
  "fk_id_language" INTEGER NOT NULL, 
  "fk_id_document" BIGINT NOT NULL, 
  "word_count" INTEGER DEFAULT 0, 
  CONSTRAINT "UNIT_pkey" PRIMARY KEY("id"), 
  CONSTRAINT "UNIT_fk" FOREIGN KEY ("fk_id_document")
    REFERENCES "public"."DOCUMENT"("id")
    ON DELETE CASCADE
    ON UPDATE NO ACTION
    NOT DEFERRABLE, 
  CONSTRAINT "UNIT_fk1" FOREIGN KEY ("fk_id_language")
    REFERENCES "public"."LANGUAGE"("id")
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
    NOT DEFERRABLE
) WITHOUT OIDS;

CREATE INDEX "UNIT_idx_document" ON "public"."UNIT"
  USING btree ("fk_id_document");

CREATE INDEX "UNIT_text_idx" ON "public"."UNIT"
  USING hash ("text");

CREATE TABLE "public"."DOCUMENT" (
  "id" BIGINT NOT NULL, 
  "fk_id_job" BIGINT, 
  CONSTRAINT "DOCUMENT_pkey" PRIMARY KEY("id"), 
  CONSTRAINT "DOCUMENT_fk" FOREIGN KEY ("fk_id_job")
    REFERENCES "public"."JOB"("id")
    ON DELETE SET NULL
    ON UPDATE NO ACTION
    NOT DEFERRABLE   
) WITHOUT OIDS;

UPDATE: Database parameters

shared_buffers = 2048MB
effective_cache_size = 4096MB
work_mem = 32MB

Total memory: 32GB
CPU: Intel Xeon X3470 @ 2.93 GHz, 8MB cache

233

asked Jul 26 '12 12:07

twoflower

1 Answers

Here is an interesting part of the ANALYZE official documentation.

For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row. [...] The extent of analysis can be controlled by adjusting the default_statistics_target configuration variable, or on a column-by-column basis by setting the per-column statistics target with ALTER TABLE ... ALTER COLUMN ... SET STATISTICS.

Apparently it is a common way to improve bad query plan. Analyze will be a bit slower, but query plan can be better.

ALTER TABLE

answered Nov 05 '22 01:11

basgys

Related questions
                            
                                In Memory OleDbConnection to Excel File
                            
                                Tracing and diagnostics for System.Data.SQLConnection
                            
                                Online Collaborative Schema Design (leverage google docs?)
                            
                                Generate MySQL data dump in SQL from PHP
                            
                                In SQL Server, how do I create a reference variable to a table?
                            
                                Cannot use the ROLLBACK statement within an INSERT-EXEC statement [duplicate]
                            
                                How to convert hex 'YMD' date to a readable date?
                            
                                How do I map true/false/unknown to -1/0/null without repetition?
                            
                                Why does EF generate this sql when querying reference property
                            
                                Statistical query in SQL - is this possible with NHibernate LINQ?
                            
                                Connect by prior tree must be symmetrical
                            
                                How can I query between two columns while still taking advantage of indexes?
                            
                                INSERT ALL INTO and Sequence.nextval for a Surrogate Key
                            
                                Contradictory Oracle query result
                            
                                How to ensure a SQL like NOLOCK when using Django Model Queries
                            
                                Way to debug stored procedures in SQL Azure?
                            
                                Best practices for synchronizing a SQL database with a REST remote server on Android
                            
                                Best (NoSQL?) DB for small docs/records, unchanging data, lots of writes, quick reads?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

PostgreSQL: The lower the LIMIT, the slower the query

Tags:

sql

postgresql

sql-execution-plan

query-optimization

twoflower

People also ask

1 Answers

basgys

Recent Activity

Donate For Us