Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Oracle 11g get all matched occurrences by a regular expression

Tags:

regex

oracle

I'm using Oracle 11g and I would like to use the REGEXP_SUBSTR to match all the occurrences for a given pattern. For example

 SELECT
  REGEXP_SUBSTR('Txa233141b Ta233141 Ta233142 Ta233147 Ta233148',
  '(^|\s)[A-Za-z]{2}[0-9]{5,}(\s|$)') "REGEXP_SUBSTR"
  FROM DUAL;

returns only the first match Ta233141 but I would like to return the other occurrences that match the regex, meaning Ta233142 Ta233147 Ta233148.

like image 949
florins Avatar asked Jul 11 '13 14:07

florins


1 Answers

This is a little late, but I needed basically the same thing and could not find a good snippet. I needed to search a free text column of a table for some terms and collect them. As this might be useful to another I have included a version based on this question. While REGEXP_SUBSTR only returns one value, Oracle also provides a REGEXP_COUNT to tell you how many matching items are present in a given string, therefore you can join this with a list of indexes to select each as follows (with examples from this query as free text from some 'source_table'):

DEFINE MATCH_EXP = "'(^|\s)[A-Za-z]{2}[0-9]{5,}'"

WITH source_table
     -- Represents some DB table with a 'free_text' column to be checked.
     AS (       ( SELECT 'Txa233141 Ta233141 Ta232 Ta233142 Ta233141 Ta233148'
                             AS free_text FROM dual )
          UNION ( SELECT 'Other stuff PH33399 mixed in OS4456908843 this line'
                             AS free_text FROM dual )
        )
   , source
     -- For some table, select rows of free text and add the number of matches
     -- in the line.
     AS ( SELECT cnt
               , free_text
          FROM ( SELECT RegExp_Count(free_text, &MATCH_EXP) AS cnt
                      , free_text 
                 FROM source_table )
          WHERE cnt > 0 )
   , iota
     -- Index generator
     AS ( SELECT RowNum AS idx
          FROM dual
          CONNECT BY RowNum <= ( SELECT Max(cnt) FROM source ) )
-- Extract the unique 'cnt' matches from each line of 'free_text'.
SELECT UNIQUE
       RegExp_SubStr(s.free_text, &MATCH_EXP, 1, i.idx) AS result
FROM   source s
  JOIN iota i
    ON ( i.idx <= s.cnt )
ORDER BY result ASC
;

It has the advantages of working for any list of selected rows and uses the CONNECT BY minimally (as this can be very slow).

like image 184
Steven Cochran Avatar answered Oct 17 '22 01:10

Steven Cochran