Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use a regex capture group in redshift (or alternative)

I have a field in a redshift column that looks like the following:

abcd1234df-TEXT_I-WANT

the characters and numbers in the first 10 digits can be either letters or numbers.

If I use a capture group regex, I would use a poorly written expression like (\w\w\w\w\w\w\w\w\w\w\W)(.*) and grap the 2nd group

But I'm having trouble implementing this in redshift, so not sure how I can grab only the stuff after the first hyphen

like image 749
user1874064 Avatar asked Jun 06 '18 00:06

user1874064


People also ask

How do I use regular expression in redshift?

2) Amazon Redshift Regex: REGEXP_Replacesource_string: A string on which the patterns need to be matched. pattern: A Regex pattern. replace_string: A string or column name that will replace each occurrence of pattern. position: Specifies the position in source_string to start searching.

How does regex group work?

What is Group in Regex? A group is a part of a regex pattern enclosed in parentheses () metacharacter. We create a group by placing the regex pattern inside the set of parentheses ( and ) . For example, the regular expression (cat) creates a single group containing the letters 'c', 'a', and 't'.


3 Answers

As mentioned before, regex might be an overkill. However, it could be useful in some cases.

Here's a basic replace pattern:

SELECT
    regexp_replace(
      'abcd1234df-TEXT_I-WANT'  -- use your input column here instead
    , '^[a-z0-9]{10}-(.*)$'     -- matches whole string, captures "TEXT_I-WANT" in $1
    , '$1'                      -- inserts $1 to return TEXT_I-WANT
    )
;
like image 133
wp78de Avatar answered Nov 01 '22 11:11

wp78de


@wp78de gives a very good advice to use REGEX_REPLACE. I allows you to choose the capture group. Using your regex, it would look like that, although you don't need 2 groups in here and using one is sufficient here.

select 
  regexp_replace(
    'abcd1234df-TEXT_I-WANT',
    '(\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W)(.*)', 
    '$2' -- replacement selecting 2nd capture group
  );

Another oprion, although less flexible is using REGEX_SUBSTR with e parameter set (Extract a substring using a subexpression). It allows you to select a substring, but only of a first capture group in your regex. You also have to set the position and occurence parameters to default 1:

Using REGEX you suggested, but only with 1 group:

select 
  regexp_substr(
    'abcd1234df-TEXT_I-WANT',
    '\\w\\w\\w\\w\\w\\w\\w\\w\\w\\w\\W(.*)', 
    1, -- position
    1, -- occurrence
    'e' -- parameters
  );
like image 35
botchniaque Avatar answered Nov 01 '22 09:11

botchniaque


Regular expressions might be overkill. Basic string operations are good enough:

select substring(col from position('-' in col) + 1)
like image 3
Gordon Linoff Avatar answered Nov 01 '22 10:11

Gordon Linoff