Python Regex, optional word in brackets?

Question

I have a quick question on regex, I have a certain string to match. It is shown below:

"[someword] This Is My Name 2010"
or
"This Is My Name 2010"
or
"(someword) This Is My Name 2010"

Basically if given any of the strings above, I want to only keep "This Is My Name" and "2010".

What I have now, which I will use result = re.search and then result.group() to get the answer:

'[\]\)]? (.+) ([0-9]{4})\D'

Basically it works with the first and third case, by allowing me to optionally match the end bracket, have a space character, and then match "This Is My Name".

However, with the second case, it only matches "Is My Name". I think this is because of the space between the '?' and '(.+)'.

Is there a way to deal with this issue in pure regex?

One way I can think of is to add an "if" statement to determine if the word starts with a [ or ( before using the appropriate regex.

The fourth bird · Accepted Answer

The pattern that you tried [\]\)]? (.+) ([0-9]{4})\D optionally matches a closing square bracket or parenthesis. Adding the \D at the end, it expects to match any character that is not a digit.

You can optionally match the (...) or [...] part before the first capturing group, as [])] only matches the optional closing one.

Then you can capture all that follows in group 1, followed by matching the last 4 digits in group 2 and add a word boundary.

(?:$[^()\n]*$ |$$[^][\n]*$$ )?(.+) ([0-9]{4})\b

(?: Non capture group
- ([^()\n]*) Match either (...) and space
- | Or
- [[^][\n]*] [...] and space
)? Close group and make it optional
(.+) Capture group 1, Match 1+ times any char except a newline followed by a space
([0-9]{4})\b Capture group 2, match 4 digits

Regex demo

Note that .* will match until the end of the line and then backtracks until the last occurrence of 4 digits. If that should be the first occurrence, you could make it non greedy .*?

Nir Alfasi · Answer

You can use re.sub to replace the first portion of the sentence if it starts with (square or round) brackets, with an empty string. No if statement is needed:

import re

s1 = "[someword] This Is My Name 2010"
s2 = "This Is My Name 2010"
s3 = "(someword) This Is My Name 2010"

reg = '$$.*?$$ |$.*?$ '

res1 = re.sub(reg, '', s1)
print(res1)

res2 = re.sub(reg, '', s2)
print(res2)

res3 = re.sub(reg, '', s3)
print(res3)

OUTPUT

This Is My Name 2010
This Is My Name 2010
This Is My Name 2010

Python Regex, optional word in brackets?

Tags:

python

regex

hkcode

2 Answers

The fourth bird

Nir Alfasi

Recent Activity

Donate For Us

Python Regex, optional word in brackets?

Tags:

python

regex

hkcode

2 Answers

The fourth bird

Nir Alfasi

Related questions

Recent Activity

Donate For Us