Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to get an array as one matching group in Python regex

Tags:

I have the following string:

Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2

I would like to know if it is possible/how to have a match like:

[['30', '20', '45', '23', '1'], ['33', '19', '44', '33', '2']] or

[(['30', '20', '45', '23'], '1'), (['33', '19', '44', '33'], '2')]

Or something similar (the resulting structure doesn't really matter) I just need to have all the ages from one Place. I know that I can iterate doing split and apply regex for each part or similar solution, but my question is if there's a way to do it once (ONE single step) using regex...

I would use findall to get all the "full matches". My issue is to get the first parameter of the "tuple" as an array...

If I do:

r = re.compile("'age': (\d+).*?; Place: (\d+).*?//")
g = r.findall("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")

I am only able to get the first age, and then the place...

g
[('30', '1')]
like image 219
Gabrielle Avatar asked Apr 23 '18 18:04

Gabrielle


People also ask

How do I match a group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".

What does group do in regex Python?

groups() method. This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.

Which regex matches one or more digits Python?

You can use out\dmf\d+ , or, if you want to match only 1 or 2 digits at the end, out\dmf\d{1,2} .


2 Answers

As far as I know RegEx is not powerful enough to store the hits of one capturing group with a quantifier in a list, followed by another capturing group.

The following does only perform one RegEx search, and one loop, but I admit it isn't very pretty.

import re

r = re.compile("(age|Place)'?: (\d+)")

g = r.finditer("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")

ages = []
ranks = {}
for m in g:
  if m[1] == 'age':
    ages.append(m[2])
  else:
    ranks[m[2]] = ages
    ages = []

print(ranks)

Basically just capture any age or Place, iterate over the matches. Store all ages into a list until we come across a Place, in which case we use the former list as a value and the Place as a key in a dictionary. Then we reset the list and start over.

Of course the caveat is that Place always comes after the ages.

like image 185
Bram Vanroy Avatar answered Sep 28 '22 19:09

Bram Vanroy


Here's a way to get close to a solution using re.findall and itertools.groupby:

import re, itertools
r = re.compile(r'(?:\b(?:age|place)\'?\s*:\s*(\d+))|//|\Z', re.I)
x = r.findall("Members: {'name': A, 'age': 30, 'gender': M, 'height': 1.56}, {'name': C, 'age': 20, 'gender': M, 'height': 1.8}, {'name': H, 'age': 45, 'gender': M, 'height': 1.97}, {'name': D, 'age': 23, 'gender': M, 'height': 1.68}; Place: 1//Members: {'name': S, 'age': 33, 'gender': M, 'height': 1.4}, {'name': C, 'age': 19, 'gender': M, 'height': 1.67}, {'name': A, 'age': 44, 'gender': M, 'height': 1.92}, {'name': C, 'age': 33, 'gender': M, 'height': 1.57}; Place: 2")

Output:

['30', '20', '45', '23', '1', '', '33', '19', '44', '33', '2', '']

Splitting with a second pass:

o = [list(g[1]) for g in itertools.groupby(x, lambda i: i != '')][::2]

Output:

[['30', '20', '45', '23', '1'], ['33', '19', '44', '33', '2']]
like image 34
ekhumoro Avatar answered Sep 28 '22 19:09

ekhumoro