Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Postgresql - table with trigram gin index with mixed case or ILike does not use the index

Tags:

postgresql

A table with trigram index, does not work if there is mixed case or ILike in the query. Im not sure what I have missed. Any ideas?

(Im using PostgreSQL 9.6.2)

CREATE TABLE public.tbltest (
    "tbltestId" int NOT null ,
    "mystring1" text,
    "mystring2" character varying,  
    CONSTRAINT "tbltest_pkey" PRIMARY KEY ("tbltestId")
);

insert into tbltest ("tbltestId","mystring1", "mystring2")
select x.id, x.id || ' Test', x.id || ' Test' from generate_series(1,100000) AS x(id);

CREATE EXTENSION pg_trgm;

CREATE INDEX tbltest_idx1 ON tbltest using gin ("mystring1" gin_trgm_ops);
CREATE INDEX tbltest_idx2 ON tbltest using gin ("mystring2" gin_trgm_ops);

Using lower case text in the query works, and uses the index

explain analyse 
select * from tbltest 
where "mystring2" Like '%test%';

QUERY PLAN                                                                                                                   |
-----------------------------------------------------------------------------------------------------------------------------|
Bitmap Heap Scan on tbltest  (cost=20.08..56.68 rows=10 width=24) (actual time=29.846..29.846 rows=0 loops=1)                |
  Recheck Cond: ((mystring2)::text ~~ '%test%'::text)                                                                        |
  Rows Removed by Index Recheck: 100000                                                                                      |
  Heap Blocks: exact=726                                                                                                     |
  ->  Bitmap Index Scan on tbltest_idx2  (cost=0.00..20.07 rows=10 width=0) (actual time=12.709..12.709 rows=100000 loops=1) |
        Index Cond: ((mystring2)::text ~~ '%test%'::text)                                                                    |
Planning time: 0.086 ms                                                                                                      |
Execution time: 29.875 ms                                                                                                    |

Like does not use the index if I add mixed case in the search

explain analyse 
select * from tbltest 
where "mystring2" Like '%Test%';

QUERY PLAN                                                                                                    |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest  (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.011..33.376 rows=100000 loops=1) |
  Filter: ((mystring2)::text ~~ '%Test%'::text)                                                               |
Planning time: 0.083 ms                                                                                       |
Execution time: 51.259 ms                                                                                     |

ILike does not use the index either

explain analyse 
select * from tbltest 
where "mystring2" ILike '%Test%';

QUERY PLAN                                                                                                    |
--------------------------------------------------------------------------------------------------------------|
Seq Scan on tbltest  (cost=0.00..1976.00 rows=99990 width=24) (actual time=0.012..87.038 rows=100000 loops=1) |
  Filter: ((mystring2)::text ~~* '%Test%'::text)                                                              |
Planning time: 0.134 ms                                                                                       |
Execution time: 105.757 ms                                                                                    |
like image 395
Haza Avatar asked Oct 17 '22 02:10

Haza


1 Answers

PostgreSQL does not use the index in the last two queries because that is the best way to process the query, not because it cannot use it.

In your EXPLAIN output you can see that the first query returns zero rows (actual ... rows=0), while the other two queries return every single row in the table (actual ... rows=100000).

The PostgreSQL optimizer's estimates reflect that situation accurately.

Since it has to access most of the rows of the table anyway, PostgreSQL knows that it will be able to get the result much cheaper if it scans the table sequentially than by using the more complicated index access method.

like image 78
Laurenz Albe Avatar answered Oct 21 '22 01:10

Laurenz Albe