Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are some ways I can improve the performance of a regular expression query in PostgreSQL 8?

I'm performing a regular expression match on a column of type character varying(256) in PostgreSQL 8.3.3. The column currently has no indices. I'd like to improve the performance of this query if I can.

Will adding an index help? Are there other things I can try to help improve performance?

like image 344
emmby Avatar asked Mar 30 '09 20:03

emmby


2 Answers

You cannot create an index that will speed up any generic regular expression; however, if you have one or a limited number of regular expressions that you are matching against, you have a few options.

As Paul Tomblin mentions, you can use an extra column or columns to indicate whether or not a given row matches that regex or regexes. That column can be indexed, and queried efficiently.

If you want to go further than that, this paper discusses an interesting sounding technique for indexing against regular expressions, which involves looking for long substrings in the regex and indexing based on whether those are present in the text to generate candidate matches. That filters down the number of rows that you actually need to check the regex against. You could probably implement this using GiST indexes, though that would be a non-trivial amount of work.

like image 68
Brian Campbell Avatar answered Oct 09 '22 05:10

Brian Campbell


An index can't do anything with a regular expression. You're going to have to do a full table scan.

If at all possible, like if you're querying for the same regex all the time, you could add a column that specifies whether this row matches that regex and maintain that on inserts and updates.

like image 21
Paul Tomblin Avatar answered Oct 09 '22 04:10

Paul Tomblin