Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle - Determine maximum supported size for regular expression

Tags:

regex

oracle

I have a regular expression that throws ORA-12733, "regular expression is too long". How do I determine what the maximum supported size is?

FYI: the offending regex is 892 characters. It's a generated regex, so I could change how I generate and execute it, but I would like to know what the limits to the max size are before I change how I am generating and executing.

(running Oracle 10.2g)

UPDATE:

If it depends on the actual regex, here's the begining of it (the rest is just the same thing repeated, with different values between ^ and $):

(^R_1A$|^R_2A$|^R_3A$|^R_4A$|^R_4B$|^R_5A$|^R_5B$...

like image 850
FrustratedWithFormsDesigner Avatar asked Apr 22 '10 19:04

FrustratedWithFormsDesigner


People also ask

Does Oracle support regex?

Usage ranges from the simple (for example, find the word San Francisco in a specified text) to the complex (for example, extract all URLs from the text) to the more complex (for instance, find all words in which every second character is a vowel). Oracle Database 10g introduces support for regular expressions.

What is the use of Regexp_substr in Oracle?

REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR , but instead of returning the position of the substring, it returns the substring itself.

Is REGEXP_LIKE faster than like?

Better Use of LIKE Query instead of REGEXP if you are not sure about value. Also LIKE is much faster than REGEXP.

What does REGEXP_LIKE return?

The REGEXP_LIKE scalar function returns a boolean value indicating if the regular expression pattern is found in a string. The function can be used only where a predicate is supported. The schema is SYSIBM. An expression that specifies the string in which the search is to take place.

Is REGEXP_LIKE case sensitive?

Specifies that matching is case insensitive. This value must not be specified with a value of 'c'. Specifies that the input data could contain more than one line. By default, the '^' and the '$' in a pattern will only match the start and the end, respectively, of the input string.


2 Answers

Looking at the documentation for the regex functions, REGEXP_SUBSTR, REGEXP_INSTR & REGEXP_REPLACE it has the following quote for the pattern:

pattern is the regular expression. It is usually a text literal and can be of any of the datatypes CHAR, VARCHAR2, NCHAR, or NVARCHAR2. It can contain up to 512 bytes. If the datatype of pattern is different from the datatype of source_char, Oracle Database converts pattern to the datatype of source_char. For a listing of the operators you can specify in pattern`**

Taken from here

like image 73
Ian Carpenter Avatar answered Sep 28 '22 07:09

Ian Carpenter


The sample regex should not need all the start/end of line anchors. ^(R_1A|R_2A|R_3A|R_4A|R_4B|R_5A|R_5B)$ would work just as fine.

Actually: If the search tokens are really as similar as in the sample, one might want to benefit from it with ^(R_[1-5]A|R_[4-5]B)$ or ^(R_([1-5]A|[4-5]B))$ (for the search string's part given in the question).

Verified in 11.2:

SELECT i, t FROM (
  SELECT 1 i, 'R_1A' t FROM DUAL UNION ALL
  SELECT 2,   'xR_2A'  FROM DUAL UNION ALL
  SELECT 3,   'R_3Ax'  FROM DUAL UNION ALL
  SELECT 4,   'xR_4Ax' FROM DUAL UNION ALL
  SELECT 5,   'R_4B'   FROM DUAL UNION ALL
  SELECT 6,   'R_5A'   FROM DUAL UNION ALL
  SELECT 7,   'R_5B'   FROM DUAL)
--WHERE REGEXP_LIKE(t, '(^R_1A$|^R_2A$|^R_3A$|^R_4A$|^R_4B$|^R_5A$|^R_5B$)')
--WHERE REGEXP_LIKE(t, '^(R_1A|R_2A|R_3A|R_4A|R_4B|R_5A|R_5B)$')
--WHERE REGEXP_LIKE(t, '^(R_[1-5]A|R_[4-5]B)$')
WHERE REGEXP_LIKE(t, '^(R_([1-5]A|[4-5]B))$')
ORDER BY i;
like image 21
Abecee Avatar answered Sep 28 '22 06:09

Abecee