I'm performing a regular expression match on a column of type character varying(256)
in PostgreSQL 8.3.3. The column currently has no indices. I'd like to improve the performance of this query if I can.
Will adding an index help? Are there other things I can try to help improve performance?
You cannot create an index that will speed up any generic regular expression; however, if you have one or a limited number of regular expressions that you are matching against, you have a few options.
As Paul Tomblin mentions, you can use an extra column or columns to indicate whether or not a given row matches that regex or regexes. That column can be indexed, and queried efficiently.
If you want to go further than that, this paper discusses an interesting sounding technique for indexing against regular expressions, which involves looking for long substrings in the regex and indexing based on whether those are present in the text to generate candidate matches. That filters down the number of rows that you actually need to check the regex against. You could probably implement this using GiST indexes, though that would be a non-trivial amount of work.
An index can't do anything with a regular expression. You're going to have to do a full table scan.
If at all possible, like if you're querying for the same regex all the time, you could add a column that specifies whether this row matches that regex and maintain that on inserts and updates.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With