Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating case-insensitive indexes on Postgres string array

I'm using a varchar[] column (varchar array) in Postgres 9.2 to store some tags. While retrieving rows by tags, I want the query to be case insensitive. However, I want to preserve the case to display in the UI (therefore I can't just store everything as lower case).

So, my question is how do I create a case-insensitive index in Postgres over a varchar array? One possible approach would be to create a functional GIN index on the column. How does one do that? Any other approaches?

like image 694
Saurabh Nanda Avatar asked Mar 24 '23 08:03

Saurabh Nanda


1 Answers

@Saurabh Nanda: Similar to what you posted, you can also create a simple function to convert your varchar array to lowercase as follows:

CREATE OR REPLACE FUNCTION array_lowercase(varchar[]) RETURNS varchar[] AS
$BODY$
  SELECT array_agg(q.tag) FROM (
    SELECT btrim(lower(unnest($1)))::varchar AS tag
  ) AS q;
$BODY$
  language sql IMMUTABLE;

Note that I'm also trimming the tags of spaces. This might not be necessary for you but I usually do for consistency.

Testing:

SELECT array_lowercase(array['Hello','WOrLD']);
 array_lowercase 
-----------------
 {hello,world}
(1 row)

As noted by Saurabh, you can then create a GIN index:

CREATE INDEX ix_tags ON tagtable USING GIN(array_lowercase(tags));

And query:

SELECT * FROM tagtable WHERE ARRAY['mytag'::varchar] && array_lowercase(tags);

UPDATE: Performance of WHILE vs array_agg/unnest

I created table of 100K 10 element text[] arrays (12 character random mixed case strings) and tested each function.

The array_agg/unnest function returned:

EXPLAIN ANALYZE VERBOSE SELECT array_lowercase(data) FROM test;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test  (cost=0.00..28703.00 rows=100000 width=184) (actual time=0.320..3041.292 rows=100000 loops=1)
   Output: array_lowercase((data)::character varying[])
 Total runtime: 3174.690 ms
(3 rows)

The WHILE function returned:

EXPLAIN ANALYZE VERBOSE SELECT array_lowercase_while(data) FROM test;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test  (cost=0.00..28703.00 rows=100000 width=184) (actual time=5.128..4356.647 rows=100000 loops=1)
   Output: array_lowercase_while((data)::character varying[])
 Total runtime: 4485.226 ms
(3 rows)

UPDATE 2: FOREACH vs. WHILE As a final experiment, I changed the WHILE function to use FOREACH:

CREATE OR REPLACE FUNCTION array_lowercase_foreach(p_input varchar[]) RETURNS varchar[] AS $BODY$
DECLARE
    el text;
    r varchar[];
BEGIN
    FOREACH el IN ARRAY p_input LOOP
        r := r || btrim(lower(el))::varchar;
    END LOOP;
    RETURN r;
END;
$BODY$
  language 'plpgsql'

Results appeared to be similar to WHILE:

EXPLAIN ANALYZE VERBOSE SELECT array_lowercase_foreach(data) FROM test;
                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on public.test  (cost=0.00..28703.00 rows=100000 width=184) (actual time=0.707..4106.867 rows=100000 loops=1)
   Output: array_lowercase_foreach((data)::character varying[])
 Total runtime: 4239.958 ms
(3 rows)

Though my tests are not by any means rigorous, I did run each version a number of times and found the numbers to be representative, suggesting that the SQL method (array_agg/unnest) is the fastest.

like image 111
marcj Avatar answered Apr 05 '23 23:04

marcj