Regex: How to extract text from last parenthesis

Question

What is a correct regular expression to extract the string "(procedure)" -or in general text from inside the parenthesis - from the strings below

input string examples are

Positron emission tomography using flutemetamol (18F) with computed tomography of brain (procedure)

another example

Urinary tract infection prophylaxis (procedure)

Possible approaches are:

Go to end of the text, and look for first opening parenthesis and take subset from that position to the end of the text
from beginning of text, identify last '(' char and do that position to end as substring

Other strings can be (different "tag" is extracted)

[1] "Xanthoma of eyelid (disorder)"                    "Ventricular tachyarrhythmia (disorder)"          
[3] "Abnormal urine odor (finding)"                    "Coloboma of iris (disorder)"                     
[5] "Macroencephaly (disorder)"                        "Right main coronary artery thrombosis (disorder)"

(general regex is sought) (or a solution in R is even better)

Andy · Accepted Answer

If it is the last part of the string then this regex will do it:

/$([^()]*)$$/

Explaination: Look for an open ( and match everything in between it that isn't ( or ) and then has a ) at the end of the string.

https://regex101.com/r/cEsQtf/1

G5W · Answer

sub can do that with the right regex

Text = c("Positron emission tomography using flutemetamol (18F) 
    with computed tomography of brain (procedure)",
    "Urinary tract infection prophylaxis (procedure)", 
    "Xanthoma of eyelid (disorder)",                    
    "Ventricular tachyarrhythmia (disorder)",          
    "Abnormal urine odor (finding)",                    
    "Coloboma of iris (disorder)",                   
    "Macroencephaly (disorder)",                        
    "Right main coronary artery thrombosis (disorder)")
sub(".*$(.*)$.*", "\1", Text)
[1] "procedure" "procedure" "disorder"  "disorder"  "finding"   "disorder" 
[7] "disorder"  "disorder"

Addendum: Detailed explanation of the regex
The question asks to find the content of the final set of parentheses in the strings. This expression is slightly confusing because it includes two different uses of parentheses, One is to represent parentheses in the string being processed and the other is to set up a "capturing group", the way that we specify what part should be returned by the expression. The expression is made up of five basic units:

1. Initial .*   - matches everything up to the final open parenthesis. 
   Note that this is relying on "greedy matching"
2. $   ...    $   - matches the final set of parentheses. 
   Because ( by itself means something else,  we need to "escape" the 
   parentheses by preceding them with \.  That is we want the regular
   expression to say   $  ...  $.  However, the way R interprets strings,
   if we just typed $ and $,  R would interpret the \ as escaping the (
   and so interpret this as just ( ... ).  So we escape the backslash.  
   R will interpret   $  ... $      as $ ... $ meaning the literal
   characters ( & ). 
3. ( ... )       Inside the pair in part 2
   This is making use of the special meaning of parentheses.  When we
   enclose an expression in parentheses, whatever value is inside them 
   will be stored in a variable for later use. That variable is called 
   \1,  which is what was used in the substitution pattern. Again, is 
   we just wrote \1, R would interpret it as if we were trying to escape
   the 1. Writing \1 is interpreted as the character \ followed by 1, 
   i.e. \1.
4. Central .*    Inside the pair in part 3
   This is what we are looking for,  all characters inside the parentheses.
5. Final   .*
   This is in the expression to match any characters that may follow the 
   final set of parentheses.

The sub function will use this to replace the matched pattern (in this case, all characters in the string) with the substitution pattern \1 i.e. the contents of the variable containing whatever was in the first (in our case only) capturing group - the stuff inside the final parentheses.

Regex: How to extract text from last parenthesis

Tags:

regex

r

userJT

2 Answers

Andy

G5W

Recent Activity

Donate For Us

Regex: How to extract text from last parenthesis

Tags:

regex

r

userJT

2 Answers

Andy

G5W

Related questions

Recent Activity

Donate For Us