Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex pattern to find n non-space characters of x length after a certain substring

I am using this regex pattern pattern = r'cig[\s:.]*(\w{10})' to extract the 10 characters after the '''cig''' contained in each line of my dataframe. With this pattern I am accounting for all cases, except for the ones where that substring contains some spaces inside it.

For example, I am trying to extract Z9F27D2198 from the string

/BENEF/FORNITURA GAS FEB-20 CIG Z9F                 27D2198 01762-0000031

In the previous string, it seems like Stack overflow formatted it, but there should be 17 whitespaces between F and 2, after CIG.

Could you help me to edit the regex pattern in order to account for the white spaces in that 10-characters substring? I am also using flags=re.I to ignore the case of the strings in my re.findall calls.

To give an example string for which this pattern works:

CIG7826328A2B FORNITURA ENERGIA ELETTRICA U TENZE COMUNALI CONVENZIONE CONSIP E

and it outputs what I want: 7826328A2B.

Thanks in advance.

like image 291
Massimiliano Garzoni Avatar asked Dec 28 '25 16:12

Massimiliano Garzoni


1 Answers

You can use

r'(?i)cig[\s:.]*(\S(?:\s*\S){9})(?!\S)'

See the regex demo. Details:

  • cig - a cig string
  • [\s:.]* - zero or more whitespaces, : or .
  • (\S(?:\s*\S){9}) - Group 1: a non-whitespace char and then nine occurrences of zero or more whitespaces followed with a non-whitespace char
  • (?!\S) - immediately to the right, there must be a whitespace or end of string.

In Python, you can use

import re
text = "/BENEF/FORNITURA GAS FEB-20 CIG Z9F               27D2198 01762-0000031"
pattern = r'cig[\s:.]*(\S(?:\s*\S){9})(?!\S)'
matches = re.finditer(pattern, text, re.I)
for match in matches:
  print(re.sub(r'\s+', '', match.group(1)), ' found at ', match.span(1))

# => Z9F27D2198  found at  (32, 57)

See the Python demo.

like image 90
Wiktor Stribiżew Avatar answered Dec 30 '25 04:12

Wiktor Stribiżew



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!