Kind of a noob here, apologies if I misstep.
I'm learning regular expressions and am on this lesson: https://regexone.com/lesson/capturing_groups.
In the python interpreter, I try to use the parentheses to only capture what precedes the .pdf part of the search string but my result captures it despite using the parens. What am I doing wrong?
import re string_one = 'file_record_transcript.pdf' string_two = 'file_07241999.pdf' string_three = 'testfile_fake.pdf.tmp' pattern = '^(file.+)\.pdf$' a = re.search(pattern, string_one) b = re.search(pattern, string_two) c = re.search(pattern, string_three) print(a.group() if a is not None else 'Not found') print(b.group() if b is not None else 'Not found') print(c.group() if c is not None else 'Not found')
Returns
file_record_transcript.pdf file_07241999.pdf Not found
But should return
file_record_transcript file_07241999 Not found
Thanks!
Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .
The re. groups() method This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.
Match objects in Python regex Match objects contain information about a particular regex match — the position in the string where the match was found, the contents of any capture groups for the match, and so on. You can work with match objects using these methods: match. group() returns the match from the string.
First group matches abc. Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.
You need the first captured group:
a.group(1) b.group(1) ...
without any captured group specification as argument to group()
, it will show the full match, like what you're getting now.
Here's an example:
In [8]: string_one = 'file_record_transcript.pdf' In [9]: re.search(r'^(file.*)\.pdf$', string_one).group() Out[9]: 'file_record_transcript.pdf' In [10]: re.search(r'^(file.*)\.pdf$', string_one).group(1) Out[10]: 'file_record_transcript'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With