I have always used stackoverflow for solving many of my problems by searching the threads. Today I would like some guidance on creating a regex pattern for my text files. My files have headings that are varied in nature and do not follow the same naming pattern. The pattern they do follow somewhat is like this:
2.0 DESCRIPTION
3.0 PLACE OF PERFORMANCE
5.0 SERVICES RETAINED
6.0 STRUCTURE AND ROLES
etc....
It always follows a number and then capital letters or number and then spaces and then capital letters. The output I need is a list :
output = ['2.0 DESCRIPTION','3.0 PLACE OF PERFORMANCE','5.0 SERVICES RETAINED','6.0 STRUCTURE AND ROLES']
I am extremely new to python and regex. I tried the following but it did not give me the output desired:
import re
text = f'''2.0 DESCRIPTION
some text here
3.0 SERVICES
som text
5.0 SERVICES RETAINED
some text
6.0 STRUCTURE AND ROLES
sometext'''
pattern = r"\d\s[A-Z][A-Z]+"
matches = re.findall(pattern,text)
But it returned:
['0 DESCRIPTION', '0 SERVICES', '0 SERVICES']
Not the output that I was looking for. Your guidance in finding a pattern will be really appreciated.
Cheers, Abhishek
You may use
matches = re.findall(r'^\d+(?:\.\d+)* *[A-Z][A-Z ]*$',text, re.M)
See the regex demo.
Here,
^ - start of a line (re.M redefines ^ behavior to include these positions, too)\d+(?:\.\d+)* - 1+ digits and then 0+ sequences of a . and 1+ digits * - zero or more spaces[A-Z][A-Z ]* - an uppercase letter and then 0 or more uppercase letters or spaces$ - end of a line.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With