Example - need to extract everything between "Begin begin" and "End end". I tried this way: <pre class="prettyprint"><code>with phrases as ( select 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' as phrase from dual ) select regexp_replace(phrase , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2') from phrases ; </code></pre> Result: Hello, World! However it fails if my text contains new line characters. Any tip how to fix this to allow extracting text containing also new lines? [edit]How does it fail: <pre class="prettyprint"><code>with phrases as ( select 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' as phrase from dual ) select regexp_replace(phrase , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2') from phrases ; </code></pre> Result: <blockquote> stackoverflow is awesome. Begin beginHello, World!End end It has everything! </blockquote> Should be: <blockquote> Hello, World! </blockquote> [edit] Another issue. Let's see to this sample: <pre class="prettyprint"><code>WITH phrases AS ( SELECT 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!End endTESTESTESTES' AS phrase FROM dual ) SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n') FROM phrases; </code></pre> Result: <blockquote> Hello, World!End end It has everything! </blockquote> So it matches last occurence of end string and this is not what I want. Subsgtring should be extreacted to first occurence of my label, so result should be: <blockquote> Hello, World! </blockquote> Everything after first occurence of label string should be ignored. Any ideas?

I'm not that familiar with the POSIX <code>[[:print:]]</code> character class but I got your query functioning using the wildcard <code>.</code>. You need to specify the <code>n</code> match parameter in <code>REGEXP_REPLACE()</code> so that <code>.</code> can match the newline character: <pre class="prettyprint"><code>WITH phrases AS ( SELECT 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' AS phrase FROM dual ) SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n') FROM phrases; </code></pre> I used the <code>\1</code> backreference as I didn't see the need to capture the other groups from the regular expression. It might also be a good idea to use the <code>*</code> quantifier (instead of <code>+</code>) in case there is nothing preceding or following the delimiters. If you want to capture all of the groups then you can use the following: <pre class="prettyprint"><code>WITH phrases AS ( SELECT 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' AS phrase FROM dual ) SELECT REGEXP_REPLACE(phrase, '(.+Begin begin)(.+)(End end.+)', '\2', 1, 1, 'n') FROM phrases; </code></pre> UPDATE - FYI, I tested with <code>[[:print:]]</code> and it doesn't work. This is not surprising since <code>[[:print:]]</code> is supposed to match printable characters. It doesn't match anything with an ASCII value below 32 (a space). You need to use <code>.</code>. UPDATE #2 - per update to question - I don't think a regex will work the way you want it to. Adding the lazy quantifier to <code>(.+)</code> has no effect and Oracle regular expressions don't have lookahead. There are a couple of things you might do, one is to use <code>INSTR()</code> and <code>SUBSTR()</code>: <pre class="prettyprint"><code>WITH phrases AS ( SELECT 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!End endTESTTESTTEST' AS phrase FROM dual ) SELECT SUBSTR(phrase, str_start, str_end - str_start) FROM ( SELECT INSTR(phrase, 'Begin begin') + LENGTH('Begin begin') AS str_start , INSTR(phrase, 'End end') AS str_end, phrase FROM phrases ); </code></pre> Another is to combine <code>INSTR()</code> and <code>SUBSTR()</code> with a regular expression: <pre class="prettyprint"><code>WITH phrases AS ( SELECT 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!End endTESTTESTTEST' AS phrase FROM dual ) SELECT REGEXP_REPLACE(SUBSTR(phrase, 1, INSTR(phrase, 'End end') + LENGTH('End end')), '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n') FROM phrases; </code></pre>

Oracle - need to extract text between given strings

Q: What is the use of REGEXP_SUBSTR in Oracle?

REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR , but instead of returning the position of the substring, it returns the substring itself.

Q: Is there a split function in Oracle?

Description This is a small pipelined table function that gets one string that includes a delimited list of values, and returns these values as a table.

Tags:

substring

regex

sql

oracle

plsql

Example - need to extract everything between "Begin begin" and "End end". I tried this way:

Click to copy

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2')
  from phrases
       ;

Result: Hello, World!

However it fails if my text contains new line characters. Any tip how to fix this to allow extracting text containing also new lines?

[edit]How does it fail:

Click to copy

with phrases as (
  select 'stackoverflow is awesome. Begin beginHello, 
  World!End end It has everything!' as phrase
    from dual
         )
select regexp_replace(phrase
     , '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2')
  from phrases
       ;

Result:

stackoverflow is awesome. Begin beginHello, World!End end It has everything!

Should be:

Hello,
World!

[edit]

Another issue. Let's see to this sample:

Click to copy

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTESTESTES' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
  FROM phrases;

Result:

Hello,
World!End end It has everything!

So it matches last occurence of end string and this is not what I want. Subsgtring should be extreacted to first occurence of my label, so result should be:

Hello,
World!

Everything after first occurence of label string should be ignored. Any ideas?

481

asked Feb 23 '15 13:02

user1209216

2 Answers

I'm not that familiar with the POSIX [[:print:]] character class but I got your query functioning using the wildcard .. You need to specify the n match parameter in REGEXP_REPLACE() so that . can match the newline character:

Click to copy

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
  FROM phrases;

I used the \1 backreference as I didn't see the need to capture the other groups from the regular expression. It might also be a good idea to use the * quantifier (instead of +) in case there is nothing preceding or following the delimiters. If you want to capture all of the groups then you can use the following:

Click to copy

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(phrase, '(.+Begin begin)(.+)(End end.+)', '\2', 1, 1, 'n')
  FROM phrases;

UPDATE - FYI, I tested with [[:print:]] and it doesn't work. This is not surprising since [[:print:]] is supposed to match printable characters. It doesn't match anything with an ASCII value below 32 (a space). You need to use ..

UPDATE #2 - per update to question - I don't think a regex will work the way you want it to. Adding the lazy quantifier to (.+) has no effect and Oracle regular expressions don't have lookahead. There are a couple of things you might do, one is to use INSTR() and SUBSTR():

Click to copy

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTTESTTEST' AS phrase
    FROM dual
)
SELECT SUBSTR(phrase, str_start, str_end - str_start) FROM (
    SELECT INSTR(phrase, 'Begin begin') + LENGTH('Begin begin') AS str_start
         , INSTR(phrase, 'End end') AS str_end, phrase
      FROM phrases
);

Another is to combine INSTR() and SUBSTR() with a regular expression:

Click to copy

WITH phrases AS (
  SELECT 'stackoverflow is awesome. Begin beginHello,
 World!End end It has everything!End endTESTTESTTEST' AS phrase
    FROM dual
)
SELECT REGEXP_REPLACE(SUBSTR(phrase, 1, INSTR(phrase, 'End end') + LENGTH('End end')), '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
  FROM phrases;

173

answered Sep 28 '22 06:09

David Faber

Try this regex:

Click to copy

([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)

Sample usage:

Click to copy

SELECT regexp_replace(
         phrase ,
         '([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)',
         '\2',
         1,  -- Start at the beginning of the phrase
         0,  -- Replace ALL occurences
         'n' -- Let dot meta character matches new line character
)
FROM
  (SELECT 'stackoverflow is awesome. Begin beginHello, '
    || chr(10)
    || ' World!End end It has everything!' AS phrase
  FROM DUAL
  )

The dot meta character (.) matches any character in the database character set and the new line character. However, when regexp_replace is called, the match_parameter must contain n switch for dot matches new lines.

answered Sep 28 '22 07:09

Stephan

Related questions
                            
                                Auto increment primary key in SQL Server (long unique code)
                            
                                Executing SQL batch containing GO statements in C#
                            
                                H2: how to tell if index exists?
                            
                                Optimize long query in mysql in a tremendous table size 33M rows
                            
                                Entity Framework returns null for Include properties
                            
                                Ordering by a field not in the select statement in SQL
                            
                                SQL Query to do a reverse CONTAINS search?
                            
                                Countermeasure to timing attack against SQL SELECT of hash token
                            
                                SQL SELECT id and count of items in same table
                            
                                What does it mean that 'OOP languages are organized around graphs'?
                            
                                Maximum Count of Distinct Values in SQL
                            
                                Converting 1 record with a start and end date into multiple records for each day
                            
                                mysqldump dump only database with certain prefix
                            
                                Difference between USING and ON when joining more than two tables
                            
                                whats the best datatype to store height?
                            
                                jOOQ addConditions: in SQL question mark appears instead of the value
                            
                                Query combinations with nested array of records in JSON datatype
                            
                                is there a better way to write this query
                            
                                Can you copy table privileges from one table to another in postgresql?
                            
                                SQL - decrease value to zero

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Oracle - need to extract text between given strings

Tags:

substring

regex

sql

oracle

plsql

user1209216

People also ask

2 Answers

David Faber

Sample usage:

Stephan

Recent Activity

Donate For Us