I have the following string:
Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2
I would like to know if it is possible/how to have a match like:
[['30', '20', '45', '23', '1'], ['33', '19', '44', '33', '2']] or
[(['30', '20', '45', '23'], '1'), (['33', '19', '44', '33'], '2')]
Or something similar (the resulting structure doesn't really matter) I just need to have all the ages from one Place. I know that I can iterate doing split and apply regex for each part or similar solution, but my question is if there's a way to do it once (ONE single step) using regex...
I would use findall to get all the "full matches". My issue is to get the first parameter of the "tuple" as an array...
If I do:
r = re.compile("'age': (\d+).*?; Place: (\d+).*?//")
g = r.findall("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")
I am only able to get the first age, and then the place...
g
[('30', '1')]
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
You can use out\dmf\d+ , or, if you want to match only 1 or 2 digits at the end, out\dmf\d{1,2} .
As far as I know RegEx is not powerful enough to store the hits of one capturing group with a quantifier in a list, followed by another capturing group.
The following does only perform one RegEx search, and one loop, but I admit it isn't very pretty.
import re
r = re.compile("(age|Place)'?: (\d+)")
g = r.finditer("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")
ages = []
ranks = {}
for m in g:
if m[1] == 'age':
ages.append(m[2])
else:
ranks[m[2]] = ages
ages = []
print(ranks)
Basically just capture any age or Place, iterate over the matches. Store all ages into a list until we come across a Place, in which case we use the former list as a value and the Place as a key in a dictionary. Then we reset the list and start over.
Of course the caveat is that Place always comes after the ages.
Here's a way to get close to a solution using re.findall
and itertools.groupby
:
import re, itertools
r = re.compile(r'(?:\b(?:age|place)\'?\s*:\s*(\d+))|//|\Z', re.I)
x = r.findall("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")
Output:
['30', '20', '45', '23', '1', '', '33', '19', '44', '33', '2', '']
Splitting with a second pass:
o = [list(g[1]) for g in itertools.groupby(x, lambda i: i != '')][::2]
Output:
[['30', '20', '45', '23', '1'], ['33', '19', '44', '33', '2']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With