I have table <code>Users</code> with column <code>displayName (text)</code> and <code>pg_trgm gin index</code> on this column. <pre class="prettyprint"><code>CREATE INDEX "Users-displayName-pg-trgm-index" ON "Users" USING gin ("displayName" COLLATE pg_catalog."default" gin_trgm_ops); </code></pre> Here is my query: <pre class="prettyprint"><code>SELECT "User"."id" ,"User"."displayName" ,"User"."firstName" ,"User"."lastName" ,"User"."email" ,"User"."password" ,"User"."isVerified" ,"User"."isBlocked" ,"User"."verificationToken" ,"User"."birthDate" ,"User"."gender" ,"User"."isPrivate" ,"User"."role" ,"User"."coverImageUrl" ,"User"."profileImageUrl" ,"User"."facebookId" ,"User"."deviceType" ,"User"."deviceToken" ,"User"."coins" ,"User"."LocaleId" ,"User"."createdAt" ,"User"."updatedAt" FROM "Users" AS "User" WHERE (similarity("User"."displayName", 'John') > 0.2) ORDER BY similarity("User"."displayName", 'John') ,"User"."id" ASC LIMIT 25; </code></pre> Query above takes <code>~200ms</code> to return results. When I remove <pre class="prettyprint"><code>ORDER BY similarity("User"."displayName", 'John') </code></pre> and order just by <code>id</code> then query speeds up to <code>30ms</code>. I am querying on table with <code>50k</code> users. Here is explain analyze: http://explain.depesz.com/s/lXC For some reason I don't see any index usage (<code>gin pg_trgm</code> on <code>displayName</code>) <hr> It seems that when I replace line <pre class="prettyprint"><code>WHERE (similarity("User"."displayName", 'John') > 0.2) </code></pre> with <pre class="prettyprint"><code>WHERE ("User"."displayName" % 'John') </code></pre> query is super-fast - can anyone tell me why? I thought that <code>%</code> operator just checks if similarity(...) is greater than treshold... so what is the difference?

PostgreSQL doesn't use indexes for function, it uses indexes only for operators. The query that orders by similarity() calls that function for every row and then orders the rows. The query that uses the <code>%</code> uses the index and runs similarity function on those that match (no index only scans for functions). If you want to order by least similarity (as in the question) those that have similarity greater than 0.2 you should use the distance operator <code><-></code>. Like so: <pre class="prettyprint"><code>WHERE "User"."displayName" <-> 'John' < 0.8 ORDER BY "User"."displayName" <-> 'John' DESC </code></pre> The distance is 1- similarity hence 0.8

In my experience GIST index has been working better / faster for similarity ordering. In this example I'm having customer table with ~500k rows. <pre class="prettyprint"><code>select *,similarity(coalesce(details::text,'') || coalesce(name,''),'9') from customer order by (coalesce(details::text,'') || coalesce(name,'')) <-> '9' asc limit 50; </code></pre> Without any index query takes around 8,5s with query plan: <pre class="prettyprint"><code> QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=47687.03..47687.16 rows=50 width=1144) -> Sort (cost=47687.03..49184.52 rows=598995 width=1144) Sort Key: (((COALESCE((details)::text, ''::text) || (COALESCE(name, ''::character varying))::text) <-> '9'::text)) -> Seq Scan on customer (cost=0.00..27788.85 rows=598995 width=1144) (4 rows) </code></pre> When adding GIN index: <pre class="prettyprint"><code>CREATE INDEX ON customer USING gin ((coalesce(details::text,'') || coalesce(name,'')) gin_trgm_ops); </code></pre> Nothing happens. Query plan still looks the same and query still takes around 8.5 seconds to complete. No index is used for ordering. After creating GIST index: <pre class="prettyprint"><code>CREATE INDEX ON customer USING gist ((coalesce(details::text,'') || coalesce(name,'')) gist_trgm_ops); </code></pre> Query takes around 240ms and query plan shows index being used <pre class="prettyprint"><code> QUERY PLAN -------------------------------------------------------------------------- Limit (cost=0.42..10.19 rows=50 width=1144) -> Index Scan using customer_expr_idx1 on customer (cost=0.42..117106.73 rows=598995 width=1144) Order By: ((COALESCE((details)::text, ''::text) || (COALESCE(name, ''::character varying))::text) <-> '9'::text) (3 rows) </code></pre> Just for curiosity rows returned looks like this: <pre class="prettyprint"><code> id | name | details | similarity --------+--------------------------+------------------------+------------ 25 | Generic Company (9) Inc. | | 0.0909091 125 | Generic Company (9) Inc. | | 0.0909091 268649 | 9bg1ubTCYo7mMcDaHmCC | { "fatty": "McDaddy" } | 0.0294118 470217 | 9hSXtDmW9cXvKk4Q6McD | { "fatty": "McDaddy" } | 0.0285714 180775 | 9pRPi1w9nqV9999g2ceo | { "fatty": "McDaddy" } | 0.0285714 162931 | 9qMyYbWNJLZdv7uYYbOl | { "fatty": "McDaddy" } | 0.0285714 176961 | 9ow1NcTjAmCDyRsapDl4 | { "fatty": "McDaddy" } | 0.0285714 ... etc ... </code></pre>

Postgres pg_trgm - why ordering by similarity is very slow

Tags:

postgresql

postgresql-9.3

I have table Users with column displayName (text) and pg_trgm gin index on this column.

CREATE INDEX "Users-displayName-pg-trgm-index"
  ON "Users"
  USING gin
  ("displayName" COLLATE pg_catalog."default" gin_trgm_ops);

Here is my query:

SELECT "User"."id"
    ,"User"."displayName"
    ,"User"."firstName"
    ,"User"."lastName"
    ,"User"."email"
    ,"User"."password"
    ,"User"."isVerified"
    ,"User"."isBlocked"
    ,"User"."verificationToken"
    ,"User"."birthDate"
    ,"User"."gender"
    ,"User"."isPrivate"
    ,"User"."role"
    ,"User"."coverImageUrl"
    ,"User"."profileImageUrl"
    ,"User"."facebookId"
    ,"User"."deviceType"
    ,"User"."deviceToken"
    ,"User"."coins"
    ,"User"."LocaleId"
    ,"User"."createdAt"
    ,"User"."updatedAt"
FROM "Users" AS "User"
WHERE (similarity("User"."displayName", 'John') > 0.2)
ORDER BY similarity("User"."displayName", 'John')
    ,"User"."id" ASC LIMIT 25;

Query above takes ~200ms to return results. When I remove

ORDER BY similarity("User"."displayName", 'John')

and order just by id then query speeds up to 30ms.

I am querying on table with 50k users.

Here is explain analyze: http://explain.depesz.com/s/lXC

For some reason I don't see any index usage (gin pg_trgm on displayName)

It seems that when I replace line

WHERE (similarity("User"."displayName", 'John') > 0.2)

with

WHERE ("User"."displayName" % 'John')

query is super-fast - can anyone tell me why? I thought that % operator just checks if similarity(...) is greater than treshold... so what is the difference?

452

asked Feb 13 '15 14:02

user606521

2 Answers

PostgreSQL doesn't use indexes for function, it uses indexes only for operators.

The query that orders by similarity() calls that function for every row and then orders the rows.

The query that uses the % uses the index and runs similarity function on those that match (no index only scans for functions).

If you want to order by least similarity (as in the question) those that have similarity greater than 0.2 you should use the distance operator <->.

Like so:

WHERE "User"."displayName" <-> 'John' < 0.8
ORDER BY "User"."displayName" <-> 'John' DESC

The distance is 1- similarity hence 0.8

170

answered Oct 24 '22 07:10

Jakub Kania

In my experience GIST index has been working better / faster for similarity ordering.

In this example I'm having customer table with ~500k rows.

select *,similarity(coalesce(details::text,'') || coalesce(name,''),'9') 
  from customer 
  order by (coalesce(details::text,'') || coalesce(name,'')) <-> '9' 
  asc limit 50;

Without any index query takes around 8,5s with query plan:

                              QUERY PLAN                                          
-----------------------------------------------------------------------------------
 Limit  (cost=47687.03..47687.16 rows=50 width=1144)
   ->  Sort  (cost=47687.03..49184.52 rows=598995 width=1144)
         Sort Key: (((COALESCE((details)::text, ''::text) ||
                     (COALESCE(name, ''::character varying))::text) <-> '9'::text))
         ->  Seq Scan on customer  (cost=0.00..27788.85 rows=598995 width=1144)
(4 rows)

When adding GIN index:

CREATE INDEX ON customer USING gin ((coalesce(details::text,'') || coalesce(name,'')) gin_trgm_ops);

Nothing happens. Query plan still looks the same and query still takes around 8.5 seconds to complete. No index is used for ordering.

After creating GIST index:

CREATE INDEX ON customer USING gist ((coalesce(details::text,'') || coalesce(name,'')) gist_trgm_ops);

Query takes around 240ms and query plan shows index being used

                     QUERY PLAN                         
--------------------------------------------------------------------------
 Limit  (cost=0.42..10.19 rows=50 width=1144)
   ->  Index Scan using customer_expr_idx1 on customer  (cost=0.42..117106.73 rows=598995 width=1144)
     Order By: ((COALESCE((details)::text, ''::text) || 
                (COALESCE(name, ''::character varying))::text) <-> '9'::text)
(3 rows)

Just for curiosity rows returned looks like this:

   id   |           name           |        details         | similarity 
--------+--------------------------+------------------------+------------
     25 | Generic Company (9) Inc. |                        |  0.0909091
    125 | Generic Company (9) Inc. |                        |  0.0909091
 268649 | 9bg1ubTCYo7mMcDaHmCC     | { "fatty": "McDaddy" } |  0.0294118
 470217 | 9hSXtDmW9cXvKk4Q6McD     | { "fatty": "McDaddy" } |  0.0285714
 180775 | 9pRPi1w9nqV9999g2ceo     | { "fatty": "McDaddy" } |  0.0285714
 162931 | 9qMyYbWNJLZdv7uYYbOl     | { "fatty": "McDaddy" } |  0.0285714
 176961 | 9ow1NcTjAmCDyRsapDl4     | { "fatty": "McDaddy" } |  0.0285714
   ... etc ...

answered Oct 24 '22 08:10

Mikael Lepistö

Related questions
                            
                                Postgresql index on xpath expression gives no speed up
                            
                                Full-text search on Heroku using pg_search gem
                            
                                Can I use aggregate functions on PostgreSQL HStore values?
                            
                                Rails: Sqlite with PG gem
                            
                                Convert a bytea column to OID while retaining values
                            
                                Why can I not read my table although it is listed by dbListTables?
                            
                                How to execute PostgreSQL script-file from command line without userinput / password
                            
                                Postgresql batch insert or ignore
                            
                                Connect as user with no password set on Postgresql 8.4 via JDBC
                            
                                Why doesn't Django/PostgreSQL reuse primary key values after objects with that primary key have been deleted? [duplicate]
                            
                                The page you were looking for doesn't exist - heroku+rails
                            
                                How to make ActiveRecord ThreadSafe
                            
                                Change data type of a table column from timestamp to bigint
                            
                                Can you create a sequence on a column that already exists in Postgres
                            
                                Determine postgres numeric max min values
                            
                                Postgresql : Is there a way to select all valid json data type
                            
                                Merging two data sets on closest date efficiently in PostgreSQL
                            
                                PostgreSQL select all from one table and join count from table relation
                            
                                psycopg2 cannot find any tables after connection
                            
                                How to add a running count to rows in a 'streak' of consecutive days

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With