How to use a regex with Awk to extract the substring between parentheses?

Tags:

In the following Bash command line, I am able to obtain the index for the substring, when the substring is between double quotes.

text='123ABCabc((XYZabc((((((abc123(((123'

echo $text | awk '{ print index($0, "((((a" )}'  # 20 is the result.

However, in my application, I will not know what character will be where the "a" is in this example. Therefore, I thought I could replace the "a" with a regex that accepted any character other than "(". I thought that /[^(}/ would be what I needed. However, I have been unable to get the Awk index command to work with any form of regex in place of the "((((a" in the example.

UPDATE: It was pointed out by William Pursell that the index operation does not accept a regex as the second operand.

Ultimately, what I was trying to accomplish was to extract the substring that was located after four or more "(", followed by one or more ")". Dennis Williamson provided the solution with the following code:

echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | 
mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}'

Thanks to all for their help!

467

asked May 31 '12 15:05

GaryH.

2 Answers

To get the position of the first non-open-parenthesis after a sequence of them:

$ echo "$text" | awk '{ print match($0, /\(\(\(\(([^(])/, arr); print arr[1, "start"]}'
20
24

This show the position of the substring "((([^(]" (20) and the position of the character after the parentheses (24).

The ability to do this with match() is a GNU (gawk) extension.

Edit:

echo 'dksjfkdj(((((((I-WANT-THIS-SUBSTRING)askdjflsdjf' | 
    mawk '{match($0,/\(\(\(\([^()]*\)/); s = substr($0,RSTART, RLENGTH); gsub(/[()]/, "", s); print s}'

123

answered Oct 22 '22 17:10

Dennis Williamson

If you want to match four or more open-parentheses in order to find the start of yet another substring within the match, you actually have to calculate the value.

# Use GNU AWK to index the character after the end of a substring.
echo "$text" |
awk --re-interval 'match( $0, /\({4,}/ ) { print RSTART + RLENGTH }'

This should give you the correct starting index of the character following the sequence of parentheses, which in this case is 24.

answered Oct 22 '22 19:10

Todd A. Jacobs

Related questions
                            
                                Pulling multiple values from JSON response using RegEx Extractor
                            
                                Is this normal Java regex behavior?
                            
                                re2 library loading
                            
                                Regular expression to pick the longest option
                            
                                Match phone country code with javascript
                            
                                Convert mysql regex to java regex (and/or vice versa)
                            
                                URL validation regex for real-world URLs
                            
                                Who has the algorithm of translating RegExp into "natural language"?
                            
                                is there any way to get all the possible outcomes of a regular expression pattern?
                            
                                REGEX - Match special character anywhere in the string
                            
                                why ruby scanf is so slow?
                            
                                Redirect to dynamic relative paths with .htaccess?
                            
                                Is it Possible to Perform Addition in a Regex?
                            
                                Re-order copyright with regex
                            
                                Finding dates in string
                            
                                Split multi-lingual string using Regex to uni-lingual tokens
                            
                                PHP - RegEx Matching Phone Numbers with or without country code
                            
                                how to get all substring for a given regex?
                            
                                RegEx - Exclude Matched Patterns
                            
                                How to replace underscores with spaces using a regex in Javascript

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use a regex with Awk to extract the substring between parentheses?

Tags:

substring

regex

indexing

awk

GaryH.

People also ask

2 Answers

Dennis Williamson

Todd A. Jacobs

Recent Activity

Donate For Us