python

Question

I have file contains several lines of strings written as :

[(W)40(indo)25(ws )20(XP)111(, )20(with )20(the )20(fragment )20(enlar)18(ged )20(for )20(clarity )20(on )20(Fig. )] TJ

I need the text inside the parentheses only. I try to use the following code :

import re

readstream = open ("E:\New folder\output5.txt","r").read()

stringExtract = re.findall('$$(.*?)$$', readstream, re.DOTALL)
string = re.compile ('$.*?$')
stringExtract2 =  string.findall (str(stringExtract))

but some strings (or text) not exist in the output e.g, for the above string the word (with) not found in the output. Also the arrangement of strings differs from the file, e.g, for strings (enlar) and (ged ) above, the second one (ged ) appeared before (enlar), such as : ( ged other strings ..... enlar) How I can fix these problems?

ekhumoro · Accepted Answer

Without regexp:

[p.split(')')[0] for p in s.split('(') if ')' in p]

Output:

['W', 'indo', 'ws ', 'XP', ', ', 'with ', 'the ', 'fragment ', 'enlar', 'ged ', 'for ', 'clarity ', 'on ', 'Fig. ']

Phil Cooper · Answer

findall looks like your friend here. Don't you just want:

re.findall(r'$.*?$',readstream)

returns:

['(W)',
 '(indo)',
 '(ws )',
 '(XP)',
 '(, )',
 '(with )',
 '(the )',
 '(fragment )',
 '(enlar)',
 '(ged )',
 '(for )',
 '(clarity )',
 '(on )',
 '(Fig. )']

Edit: as @vikramis showed, to remove the parens, use: re.findall(r'$(.*?)$', readstream). Also, note that it is common (but not requested here) to trim trailing whitespace with something like:

re.findall(r'$(.*?) *$', readstream)

python - Return Text Between Parenthesis

Tags:

regex

python-2.7

ANjell

2 Answers

ekhumoro

Phil Cooper

Recent Activity

Donate For Us