I have something like this:
Othername California (2000) (T) (S) (ok) {state (#2.1)}
Is there a regex code to obtain:
Othername California ok 2.1
I.e. I would like to keep the numbers within round parenthesis which are in turn within {} and keep the text "ok" which is within (). I specifically need the string "ok" to be printed out, if included in my lines, but I would like to get rid of other text within parenthesis eg (V), (S) or (2002).
I am aware that probably regex is not the most efficient way to handle such a problem.
Any help would be appreciated.
EDIT:
The string may vary since if some information is unavailable is not included in the line. Also the text itself is mutable (eg. I don't have "state" for every line). So one can have for example:
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}
Name1 Name2 Name3 (2000) {Education (#3.2)} Name1 Name2 Name3 (2000) (ok) {edu (#1.1)} Name1 Name2 (2002) {edu (#1.1)} Name1 Name2 Name3 (2000) (V) {variation (#4.12)} Othername California (2000) (T) (S) (ok) {state (#2.1)}
>>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}") >>> r = regex.search(string) >>> r <_sre.SRE_Match object at 0x54e2105f36c16a48> >>> regex.match(string) <_sre.SRE_Match object at 0x54e2105f36c169e8> # Run findall >>> regex.findall(string) [ (u'Name1 Name2 Name3' , u'' , u'3.2'), (u'Name1 Name2 Name3' , u'ok', u'1.1'), (u'Name1 Name2' , u'' , u'1.1'), (u'Name1 Name2 Name3' , u'' , u'4.12'), (u'Othername California', u'ok', u'2.1') ]
Try this one:
import re
thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'''
([^(]*) # match anything but a (
\ # a space
(?: # non capturing parentheses
\([^(]*\) # parentheses
\ # a space
){3} # three times
\(([^(]*)\) # capture fourth parentheses contents
\ # a space
{ # opening {
[^}]* # anything but }
\(\# # opening ( followed by #
([^)]*) # match anything but )
\) # closing )
} # closing }
'''
match = re.match(regex, thestr, re.X)
print match.groups()
Output:
('Othername California', 'ok', '2.1')
And here's the compressed version:
import re
thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)
print match.groups()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With