Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Regex, optional word in brackets?

Tags:

python

regex

I have a quick question on regex, I have a certain string to match. It is shown below:

"[someword] This Is My Name 2010"
or
"This Is My Name 2010"
or
"(someword) This Is My Name 2010"

Basically if given any of the strings above, I want to only keep "This Is My Name" and "2010".

What I have now, which I will use result = re.search and then result.group() to get the answer:

'[\]\)]? (.+) ([0-9]{4})\D'

Basically it works with the first and third case, by allowing me to optionally match the end bracket, have a space character, and then match "This Is My Name".

However, with the second case, it only matches "Is My Name". I think this is because of the space between the '?' and '(.+)'.

Is there a way to deal with this issue in pure regex?

One way I can think of is to add an "if" statement to determine if the word starts with a [ or ( before using the appropriate regex.

like image 961
hkcode Avatar asked Apr 09 '26 20:04

hkcode


2 Answers

The pattern that you tried [\]\)]? (.+) ([0-9]{4})\D optionally matches a closing square bracket or parenthesis. Adding the \D at the end, it expects to match any character that is not a digit.


You can optionally match the (...) or [...] part before the first capturing group, as [])] only matches the optional closing one.

Then you can capture all that follows in group 1, followed by matching the last 4 digits in group 2 and add a word boundary.

(?:\([^()\n]*\) |\[[^][\n]*\] )?(.+) ([0-9]{4})\b
  • (?: Non capture group
    • ([^()\n]*) Match either (...) and space
    • | Or
    • [[^][\n]*] [...] and space
  • )? Close group and make it optional
  • (.+) Capture group 1, Match 1+ times any char except a newline followed by a space
  • ([0-9]{4})\b Capture group 2, match 4 digits

Regex demo

Note that .* will match until the end of the line and then backtracks until the last occurrence of 4 digits. If that should be the first occurrence, you could make it non greedy .*?

like image 156
The fourth bird Avatar answered Apr 12 '26 11:04

The fourth bird


You can use re.sub to replace the first portion of the sentence if it starts with (square or round) brackets, with an empty string. No if statement is needed:

import re

s1 = "[someword] This Is My Name 2010"
s2 = "This Is My Name 2010"
s3 = "(someword) This Is My Name 2010"

reg = '\[.*?\] |\(.*?\) '

res1 = re.sub(reg, '', s1)
print(res1)

res2 = re.sub(reg, '', s2)
print(res2)

res3 = re.sub(reg, '', s3)
print(res3)

OUTPUT

This Is My Name 2010
This Is My Name 2010
This Is My Name 2010
like image 37
Nir Alfasi Avatar answered Apr 12 '26 11:04

Nir Alfasi