Regex nested parenthesis in python

Question

I have something like this:

Othername California (2000) (T) (S) (ok) {state (#2.1)}

Is there a regex code to obtain:

Othername California ok 2.1

I.e. I would like to keep the numbers within round parenthesis which are in turn within {} and keep the text "ok" which is within (). I specifically need the string "ok" to be printed out, if included in my lines, but I would like to get rid of other text within parenthesis eg (V), (S) or (2002).

I am aware that probably regex is not the most efficient way to handle such a problem.

Any help would be appreciated.

EDIT:

The string may vary since if some information is unavailable is not included in the line. Also the text itself is mutable (eg. I don't have "state" for every line). So one can have for example:

Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}

Stephan · Accepted Answer

Regex

(.+)\s+$\d+$.+?(?:$([^)]{2,})$\s+(?={))?\{.+$#(\d+\.\d+)$\}

Regular expression image

Text used for test

Name1 Name2 Name3 (2000) {Education (#3.2)}
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
Othername California (2000) (T) (S) (ok) {state (#2.1)}

Test

>>> regex = re.compile("(.+)\s+$\d+$.+?(?:$([^)]{2,})$\s+(?={))?\{.+$#(\d+\.\d+)$\}")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x54e2105f36c16a48>
>>> regex.match(string)
<_sre.SRE_Match object at 0x54e2105f36c169e8>

# Run findall
>>> regex.findall(string)
[
   (u'Name1 Name2 Name3'   , u''  , u'3.2'),
   (u'Name1 Name2 Name3'   , u'ok', u'1.1'),
   (u'Name1 Name2'         , u''  , u'1.1'),
   (u'Name1 Name2 Name3'   , u''  , u'4.12'),
   (u'Othername California', u'ok', u'2.1')
]

gitaarik · Answer

Try this one:

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'

regex = r'''
    ([^(]*)             # match anything but a (
    \                   # a space
    (?:                 # non capturing parentheses
        $[^(]*$       # parentheses
        \               # a space
    ){3}                # three times
    $([^(]*)$         # capture fourth parentheses contents
    \                   # a space
    {                   # opening {
        [^}]*           # anything but }
        $\#            # opening ( followed by #
            ([^)]*)     # match anything but )
        $              # closing )
    }                   # closing }
'''

match = re.match(regex, thestr, re.X)

print match.groups()

Output:

('Othername California', 'ok', '2.1')

And here's the compressed version:

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:$[^(]*$ ){3}$([^(]*)$ {[^}]*$\#([^)]*)$}'
match = re.match(regex, thestr)

print match.groups()

Regex nested parenthesis in python

Tags:

python

regex

text

user2447387

2 Answers

Regex

Text used for test

Test

Stephan

gitaarik

Recent Activity

Donate For Us

Regex nested parenthesis in python

Tags:

python

regex

text

user2447387

2 Answers

Regex

Text used for test

Test

Stephan

gitaarik

Related questions

Recent Activity

Donate For Us