I have a field in a redshift column that looks like the following:
abcd1234df-TEXT_I-WANT
the characters and numbers in the first 10 digits can be either letters or numbers.
If I use a capture group regex, I would use a poorly written expression like (\w\w\w\w\w\w\w\w\w\w\W)(.*)
and grap the 2nd group
But I'm having trouble implementing this in redshift, so not sure how I can grab only the stuff after the first hyphen
2) Amazon Redshift Regex: REGEXP_Replacesource_string: A string on which the patterns need to be matched. pattern: A Regex pattern. replace_string: A string or column name that will replace each occurrence of pattern. position: Specifies the position in source_string to start searching.
What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.
As mentioned before, regex might be an overkill. However, it could be useful in some cases.
Here's a basic replace pattern:
SELECT
regexp_replace(
'abcd1234df-TEXT_I-WANT' -- use your input column here instead
, '^[a-z0-9]{10}-(.*)$' -- matches whole string, captures "TEXT_I-WANT" in $1
, '$1' -- inserts $1 to return TEXT_I-WANT
)
;
@wp78de gives a very good advice to use REGEX_REPLACE. I allows you to choose the capture group. Using your regex, it would look like that, although you don't need 2 groups in here and using one is sufficient here.
select
regexp_replace(
'abcd1234df-TEXT_I-WANT',
'(\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W)(.*)',
'$2' -- replacement selecting 2nd capture group
);
Another oprion, although less flexible is using REGEX_SUBSTR with e
parameter set (Extract a substring using a subexpression). It allows you to select a substring, but only of a first capture group in your regex. You also have to set the position and occurence parameters to default 1
:
Using REGEX you suggested, but only with 1 group:
select
regexp_substr(
'abcd1234df-TEXT_I-WANT',
'\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W(.*)',
1, -- position
1, -- occurrence
'e' -- parameters
);
Regular expressions might be overkill. Basic string operations are good enough:
select substring(col from position('-' in col) + 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With