Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why Postgres does not use indexes with OR condition on 2 separate tables

Tags:

postgresql

I am trying to improve the performance of a SQL query on a Postgres 9.4 database. I managed to re-write the query so that it would use indexes and it is superfast now! But I do not quite understand why.

This is the original query:

SELECT DISTINCT dt.id, dt.updated_at
FROM public.day dt
    INNER JOIN public.optimized_localized_day sldt ON sldt.day_id = dt.id
    INNER JOIN public.day_template_place dtp ON dtp.day_template_id = dt.id
    INNER JOIN public.optimized_place op ON op.geoname_id = dtp.geoname_id
WHERE
    op.alternate_localized_names ILIKE unaccent('%query%') OR
    lower(sldt.unaccent_title) LIKE unaccent(lower('%query%')) OR
    lower(sldt.unaccent_description) LIKE unaccent(lower('%query%'))
ORDER BY dt.updated_at DESC
LIMIT 100;

I have placed 3 trigram indexes using pg_trgm on op.alternate_localized_names, lower(sldt.unaccent_title) and lower(sldt.unaccent_description).

But, Postgres is not using them, instead it performs a SeqScan on the full tables to join them as shown by EXPLAIN:

Limit
  ->  Unique
        ->  Sort
              Sort Key: dt.updated_at, dt.id
              ->  Hash Join
                    Hash Cond: (sldt.day_id = dt.id)
                    Join Filter: ((op.alternate_localized_names ~~* unaccent('%query%'::text)) OR (lower(sldt.unaccent_title) ~~ unaccent('%query%'::text)) OR (lower(sldt.unaccent_description) ~~ unaccent('%query%'::text)))
                    ->  Seq Scan on optimized_localized_day sldt
                    ->  Hash
                          ->  Hash Join
                                Hash Cond: (dtp.geoname_id = op.geoname_id)
                                ->  Hash Join
                                      Hash Cond: (dtp.day_template_id = dt.id)
                                      ->  Seq Scan on day_template_place dtp
                                      ->  Hash
                                            ->  Seq Scan on day dt
                                ->  Hash
                                      ->  Seq Scan on optimized_place op

However, when I split the query in 2, one to search on public.optimized_localized_day and one on public.optimized_place, it now uses their indexes:

SELECT DISTINCT dt.id, dt.updated_at
FROM public.day dt
         INNER JOIN public.day_template_place dtp ON dtp.day_template_id = dt.id
         INNER JOIN public.optimized_place op ON op.geoname_id = dtp.geoname_id
WHERE op.alternate_localized_names ILIKE unaccent('%query%')
UNION
SELECT DISTINCT dt.id, dt.updated_at
FROM public.day dt
         INNER JOIN public.optimized_localized_day sldt ON sldt.day_id = dt.id
WHERE lower(sldt.unaccent_title) LIKE unaccent(lower('%query%'))
   OR lower(sldt.unaccent_description) LIKE unaccent(lower('%query%'));

And the EXPLAIN:

HashAggregate
  ->  Append
        ->  HashAggregate
              ->  Nested Loop
                    ->  Nested Loop
                          ->  Bitmap Heap Scan on optimized_place op
                                Recheck Cond: (alternate_localized_names ~~* unaccent('%query%'::text))
                                ->  Bitmap Index Scan on idx_trgm_place_lower
                                      Index Cond: (alternate_localized_names ~~* unaccent('%jericho%'::text))
                          ->  Bitmap Heap Scan on day_template_place dtp
                                Recheck Cond: (geoname_id = op.geoname_id)
                                ->  Bitmap Index Scan on day_template_place_geoname_idx
                                      Index Cond: (geoname_id = op.geoname_id)
                    ->  Index Scan using day_pkey on day dt
                          Index Cond: (id = dtp.day_template_id)
        ->  HashAggregate
              ->  Nested Loop
                    ->  Bitmap Heap Scan on optimized_localized_day sldt
                          Recheck Cond: ((lower(unaccent_title) ~~ unaccent('%query%'::text)) OR (lower(unaccent_description) ~~ unaccent('%query%'::text)))
                          ->  BitmapOr
                                ->  Bitmap Index Scan on tgrm_idx_localized_day_title
                                      Index Cond: (lower(unaccent_title) ~~ unaccent('%query%'::text))
                                ->  Bitmap Index Scan on tgrm_idx_localized_day_description
                                      Index Cond: (lower(unaccent_description) ~~ unaccent('%query%'::text))
                    ->  Index Scan using day_pkey on day dt_1
                          Index Cond: (id = sldt.day_id)

From what I understand, having conditions on 2 separate tables in an OR clause causes Postgres to join the tables first and then filter them. But I am not sure about this. Second thing that puzzles me, I would like to understand how Postgres manages the filtering in the second query.

Do you guys know how Postgres handles those 2 cases ? Thanks :)

like image 289
RomWW12 Avatar asked Oct 18 '25 14:10

RomWW12


1 Answers

The transformation of the original query to the UNION cannot be made automatically.

Consider a simplified case:

SELECT x.a, y.b
FROM x JOIN y USING (c)
WHERE x.a = 0 OR x.b = 0;

Imagine it has three result rows:

 a | b
---+---
 0 | 0
 1 | 0
 1 | 0
  • If you replace this with

    SELECT x.a, y.b
    FROM x JOIN y USING (c)
    WHERE x.a = 0
    UNION
    SELECT x.a, y.b
    FROM x JOIN y USING (c)
    WHERE y.b = 0;
    

    the result will only have two rows, because UNION removes duplicates.

  • If you use UNION ALL instead, the result will have four rows, because the row with the two zeros will appear twice, once from each branch of the query.

So this transformation cannot always be made safely. In your case, you can get away with it, because you remove duplicates anyway.

By the way: if you use UNION, you don't need the DISTINCT any more, because duplicates will be removed anyway. Your query will become cheaper if you remove the DISTINCTs.

In the second branch of your second query, PostgreSQL can handle the OR with index scans because the conditions are on the same table. In that case, PostgreSQL can perform a bitmap index scan:

  • The index is scanned, and PostgreSQL build a bitmap in memory that contains 1 for each table row where the index scan results in a match and 0 otherwise.

    This bitmap is ordered in the physical order of the table rows.

  • The same thing happens for the other condition with the other index.

  • The resulting bitmaps are joined with a bit-wise OR operation.

  • The resulting bitmap is used to fetch the matching rows from the table.

    A trigram index is only a filter that can have false positive results, so the original condition has to be re-checked during that table scan.

like image 145
Laurenz Albe Avatar answered Oct 21 '25 13:10

Laurenz Albe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!