Example - need to extract everything between "Begin begin" and "End end". I tried this way:
with phrases as (
select 'stackoverflow is awesome. Begin beginHello, World!End end It has everything!' as phrase
from dual
)
select regexp_replace(phrase
, '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2')
from phrases
;
Result: Hello, World!
However it fails if my text contains new line characters. Any tip how to fix this to allow extracting text containing also new lines?
[edit]How does it fail:
with phrases as (
select 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!' as phrase
from dual
)
select regexp_replace(phrase
, '([[:print:]]+Begin begin)([[:print:]]+)(End end[[:print:]]+)', '\2')
from phrases
;
Result:
stackoverflow is awesome. Begin beginHello, World!End end It has everything!
Should be:
Hello,
World!
[edit]
Another issue. Let's see to this sample:
WITH phrases AS (
SELECT 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!End endTESTESTESTES' AS phrase
FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
FROM phrases;
Result:
Hello,
World!End end It has everything!
So it matches last occurence of end string and this is not what I want. Subsgtring should be extreacted to first occurence of my label, so result should be:
Hello,
World!
Everything after first occurence of label string should be ignored. Any ideas?
Use a SUBSTR() function. The first argument is the string or the column name. The second argument is the index of the character at which the substring should begin. The third argument is the length of the substring.
To extract part string between two different characters, you can do as this: Select a cell which you will place the result, type this formula =MID(LEFT(A1,FIND(">",A1)-1),FIND("<",A1)+1,LEN(A1)), and press Enter key. Note: A1 is the text cell, > and < are the two characters you want to extract string between.
REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR , but instead of returning the position of the substring, it returns the substring itself.
Description This is a small pipelined table function that gets one string that includes a delimited list of values, and returns these values as a table.
I'm not that familiar with the POSIX [[:print:]]
character class but I got your query functioning using the wildcard .
. You need to specify the n
match parameter in REGEXP_REPLACE()
so that .
can match the newline character:
WITH phrases AS (
SELECT 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!' AS phrase
FROM dual
)
SELECT REGEXP_REPLACE(phrase, '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
FROM phrases;
I used the \1
backreference as I didn't see the need to capture the other groups from the regular expression. It might also be a good idea to use the *
quantifier (instead of +
) in case there is nothing preceding or following the delimiters. If you want to capture all of the groups then you can use the following:
WITH phrases AS (
SELECT 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!' AS phrase
FROM dual
)
SELECT REGEXP_REPLACE(phrase, '(.+Begin begin)(.+)(End end.+)', '\2', 1, 1, 'n')
FROM phrases;
UPDATE - FYI, I tested with [[:print:]]
and it doesn't work. This is not surprising since [[:print:]]
is supposed to match printable characters. It doesn't match anything with an ASCII value below 32 (a space). You need to use .
.
UPDATE #2 - per update to question - I don't think a regex will work the way you want it to. Adding the lazy quantifier to (.+)
has no effect and Oracle regular expressions don't have lookahead. There are a couple of things you might do, one is to use INSTR()
and SUBSTR()
:
WITH phrases AS (
SELECT 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!End endTESTTESTTEST' AS phrase
FROM dual
)
SELECT SUBSTR(phrase, str_start, str_end - str_start) FROM (
SELECT INSTR(phrase, 'Begin begin') + LENGTH('Begin begin') AS str_start
, INSTR(phrase, 'End end') AS str_end, phrase
FROM phrases
);
Another is to combine INSTR()
and SUBSTR()
with a regular expression:
WITH phrases AS (
SELECT 'stackoverflow is awesome. Begin beginHello,
World!End end It has everything!End endTESTTESTTEST' AS phrase
FROM dual
)
SELECT REGEXP_REPLACE(SUBSTR(phrase, 1, INSTR(phrase, 'End end') + LENGTH('End end')), '.+Begin begin(.+)End end.+', '\1', 1, 1, 'n')
FROM phrases;
Try this regex:
([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)
SELECT regexp_replace(
phrase ,
'([[:print:]]+Begin begin)(.+?)(End end[[:print:]]+)',
'\2',
1, -- Start at the beginning of the phrase
0, -- Replace ALL occurences
'n' -- Let dot meta character matches new line character
)
FROM
(SELECT 'stackoverflow is awesome. Begin beginHello, '
|| chr(10)
|| ' World!End end It has everything!' AS phrase
FROM DUAL
)
The dot meta character (.
) matches any character in the database character set and the new line character. However, when regexp_replace is called, the match_parameter must contain n
switch for dot
matches new lines.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With