Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python - Return Text Between Parenthesis

I have file contains several lines of strings written as :

[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ

I need the text inside the parentheses only. I try to use the following code :

import re

readstream = open ("E:\\New folder\\output5.txt","r").read()

stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)
string = re.compile ('\(.*?\)')
stringExtract2 =  string.findall (str(stringExtract))

but some strings (or text) not exist in the output e.g, for the above string the word (with) not found in the output. Also the arrangement of strings differs from the file, e.g, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems?

like image 584
ANjell Avatar asked Dec 02 '14 22:12

ANjell


2 Answers

Without regexp:

[p.split(')')[0] for p in s.split('(') if ')' in p]

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']
like image 146
ekhumoro Avatar answered Oct 02 '22 09:10

ekhumoro


findall looks like your friend here. Don't you just want:

re.findall(r'\(.*?\)',readstream)

returns:

['(W)',
 '(indo)',
 '(ws )',
 '(XP)',
 '(, )',
 '(with )',
 '(the )',
 '(fragment )',
 '(enlar)',
 '(ged )',
 '(for )',
 '(clarity )',
 '(on )',
 '(Fig. )']

Edit: as @vikramis showed, to remove the parens, use: re.findall(r'\((.*?)\)', readstream). Also, note that it is common (but not requested here) to trim trailing whitespace with something like:

re.findall(r'\((.*?) *\)', readstream)
like image 42
Phil Cooper Avatar answered Oct 02 '22 09:10

Phil Cooper