I have a quick question on regex, I have a certain string to match. It is shown below:
"[someword] This Is My Name 2010"
or
"This Is My Name 2010"
or
"(someword) This Is My Name 2010"
Basically if given any of the strings above, I want to only keep "This Is My Name" and "2010".
What I have now, which I will use result = re.search and then result.group() to get the answer:
'[\]\)]? (.+) ([0-9]{4})\D'
Basically it works with the first and third case, by allowing me to optionally match the end bracket, have a space character, and then match "This Is My Name".
However, with the second case, it only matches "Is My Name". I think this is because of the space between the '?' and '(.+)'.
Is there a way to deal with this issue in pure regex?
One way I can think of is to add an "if" statement to determine if the word starts with a [ or ( before using the appropriate regex.
The pattern that you tried [\]\)]? (.+) ([0-9]{4})\D optionally matches a closing square bracket or parenthesis. Adding the \D at the end, it expects to match any character that is not a digit.
You can optionally match the (...) or [...] part before the first capturing group, as [])] only matches the optional closing one.
Then you can capture all that follows in group 1, followed by matching the last 4 digits in group 2 and add a word boundary.
(?:\([^()\n]*\) |\[[^][\n]*\] )?(.+) ([0-9]{4})\b
(?: Non capture group
([^()\n]*) Match either (...) and space| Or[[^][\n]*] [...] and space)? Close group and make it optional(.+) Capture group 1, Match 1+ times any char except a newline followed by a space([0-9]{4})\b Capture group 2, match 4 digitsRegex demo
Note that .* will match until the end of the line and then backtracks until the last occurrence of 4 digits. If that should be the first occurrence, you could make it non greedy .*?
You can use re.sub to replace the first portion of the sentence if it starts with (square or round) brackets, with an empty string. No if statement is needed:
import re
s1 = "[someword] This Is My Name 2010"
s2 = "This Is My Name 2010"
s3 = "(someword) This Is My Name 2010"
reg = '\[.*?\] |\(.*?\) '
res1 = re.sub(reg, '', s1)
print(res1)
res2 = re.sub(reg, '', s2)
print(res2)
res3 = re.sub(reg, '', s3)
print(res3)
OUTPUT
This Is My Name 2010
This Is My Name 2010
This Is My Name 2010
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With