I have file contains several lines of strings written as :
[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ
I need the text inside the parentheses only. I try to use the following code :
import re
readstream = open ("E:\\New folder\\output5.txt","r").read()
stringExtract = re.findall('\[(.*?)\]', readstream, re.DOTALL)
string = re.compile ('\(.*?\)')
stringExtract2 = string.findall (str(stringExtract))
but some strings (or text) not exist in the output e.g, for the above string the word (with) not found in the output. Also the arrangement of strings differs from the file, e.g, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems?
Without regexp:
[p.split(')')[0] for p in s.split('(') if ')' in p]
Output:
['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']
findall looks like your friend here. Don't you just want:
re.findall(r'\(.*?\)',readstream)
returns:
['(W)',
'(indo)',
'(ws )',
'(XP)',
'(, )',
'(with )',
'(the )',
'(fragment )',
'(enlar)',
'(ged )',
'(for )',
'(clarity )',
'(on )',
'(Fig. )']
Edit:
as @vikramis showed, to remove the parens, use: re.findall(r'\((.*?)\)', readstream)
. Also, note that it is common (but not requested here) to trim trailing whitespace with something like:
re.findall(r'\((.*?) *\)', readstream)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With