Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Capture groups with Regular Expression (Python)

Tags:

python

regex

Kind of a noob here, apologies if I misstep.

I'm learning regular expressions and am on this lesson: https://regexone.com/lesson/capturing_groups.

In the python interpreter, I try to use the parentheses to only capture what precedes the .pdf part of the search string but my result captures it despite using the parens. What am I doing wrong?

import re string_one = 'file_record_transcript.pdf' string_two = 'file_07241999.pdf' string_three = 'testfile_fake.pdf.tmp'  pattern = '^(file.+)\.pdf$' a = re.search(pattern, string_one) b = re.search(pattern, string_two) c = re.search(pattern, string_three)  print(a.group() if a is not None else 'Not found') print(b.group() if b is not None else 'Not found') print(c.group() if c is not None else 'Not found') 

Returns

file_record_transcript.pdf file_07241999.pdf Not found 

But should return

file_record_transcript file_07241999 Not found 

Thanks!

like image 652
L. Robinson Avatar asked Feb 10 '18 10:02

L. Robinson


People also ask

How do I create a capture group in regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .

When capturing regex groups what datatype does the groups method return?

The re. groups() method This method returns a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern.

What is Match Group () in Python?

Match objects in Python regex Match objects contain information about a particular regex match — the position in the string where the match was found, the contents of any capture groups for the match, and so on. You can work with match objects using these methods: match. group() returns the match from the string.

What is first capturing group in regex?

First group matches abc. Escaped parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.


Video Answer


1 Answers

You need the first captured group:

a.group(1) b.group(1) ... 

without any captured group specification as argument to group(), it will show the full match, like what you're getting now.

Here's an example:

In [8]: string_one = 'file_record_transcript.pdf'  In [9]: re.search(r'^(file.*)\.pdf$', string_one).group() Out[9]: 'file_record_transcript.pdf'  In [10]: re.search(r'^(file.*)\.pdf$', string_one).group(1) Out[10]: 'file_record_transcript' 
like image 106
heemayl Avatar answered Oct 15 '22 08:10

heemayl