Suppose that I have the folliwing string
string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
Now, I want to find the set of numbers using regular expressions in Python. Note that the numbers must be preceded by "serial 7's"
I have tried the following:
re.findall('(?<=serial 7\'s )(\d+, )', string)
re.findall('(?<=serial 7\'s )(\d+, )+', string)
Nothing seems to work. Note that there might be unknown number of integers we are trying to extract. I only want numbers with the specific pattern. Not other numbers that might be scattered within the text.
Expected output: ['93','86','79','72','65']
Another way to do it using one regular expression:
import re
string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
regex = r"(?<=serial 7's) (\d+-?)+"
matches = re.finditer(regex, test_str, re.MULTILINE)
for match in matches:
integers = match.group(0).strip().split("-")
print(integers) # ['93', '86', '79', '72', '65']
My two cents, you could use the below pattern with re.search
:
\bserial 7's\s(\d+(?:-\d+)*)
import re
s = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
res = re.search(r"\bserial 7's\s(\d+(?:-\d+)*)", s)
if res:
print(res.group(1).split('-')) # ['93', '86', '79', '72', '65']
else:
print('No match')
I'd check if any match actually occurs first where the pattern must include numbers which, if there are multiple values, are delimited by an hyphen. Since you mentioned: "Note that there might be unknown number of integers we are trying to extract. I only want numbers with the specific pattern.".
\b
- Word boundary.serial 7's
- Match "serial 7's" literally.\s+
- One or more whitespace characters.(
- Open capture group.\d+
- Match at least a single digit.(?:-\d+)*
- Non-capture group for zero or more times an hyphen followed by at least a single digit.)
- Close capture group.Alternatively one could use regex
module instead and go with a non-fixed width positive lookbehind:
(?<=\bserial 7's\s+(?:\d+-)*)\d+
import regex
s = "serial 7's 93-86-79-72-65 very slow, recall 77 3/3 "
lst = regex.findall(r"(?<=\bserial 7's\s+(?:\d+-)*)\d+", s)
print(lst) # ['93', '86', '79', '72', '65']
(?<=
- Start of the positive lookbehind.
\b
- A word boudnary.serial 7's
- Literally "serial 7's".\s+
- One ore more whitespace characters.(?:
- Open non-capture group.
\d+-
- Match at least a single digit followed by a hyphen.)*
- Close non-capture group and match it zero or more times.)
- Close positive lookbehind.\d+
- Match at least a single digit.I would use re.findall
here combined with split
:
string = "serial 7's 93-86-79-72-65 very slow"
matches = re.findall(r"\bserial 7's (\S+)", string)
nums = matches[0].split('-')
print(nums)
This prints:
['93', '86', '79', '72', '65']
If you can make use of the regex module, you could also use \G
and \K
(?:\bserial 7's |\G(?!^))-?\K\d+
Explanation
(?:
Non capture group
\bserial 7's
Match serial 7's and space|
Or\G(?!^)
The \G
anchor matches at 2 positions: at the beginning of the string, or at the end of the previous match. We don't want the match to start at the beginning, so exclude that using a negative lookahead.)
-?\K
Match optional -
and reset the match buffer (forget what is matched until now)\d+
Match 1+ digitsRegex demo | Python demo
Example code
import regex
pattern = r"(?:\bserial 7's |\G(?!^))-?\K\d+"
string = "serial 7's 93-86-79-72-65 very slow, recall 3/3 "
print(regex.findall(pattern, string))
Output
['93', '86', '79', '72', '65']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With