Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression in Python won't match end of a string

Tags:

python

regex

I'm just learning Python, and I can't seem to figure out regular expressions.

r1 = re.compile("$.pdf")
if r1.match("spam.pdf"):
    print 'yes'
else:
    print 'no'

I want this code to print 'yes', but it obstinately prints 'no'. I've also tried each of the following:

r1 = re.compile(r"$.pdf")

r1 = re.compile("$ .pdf")

r1 = re.compile('$.pdf')

if re.match("$.pdf", "spam.pdf")

r1 = re.compile(".pdf")

Plus countless other variations. I've been searching for quite a while, but can't find/understand anything that solves my problem. Can someone help out a newbie?

like image 514
user1634426 Avatar asked Aug 29 '12 23:08

user1634426


People also ask

How do you match the end of a string in Python?

Python String endswith() Method Python string method endswith() returns True if the string ends with the specified suffix, otherwise return False optionally restricting the matching with the given indices start and end.

How do you end a string in regex?

End of String or Line: $ The $ anchor specifies that the preceding pattern must occur at the end of the input string, or before \n at the end of the input string. If you use $ with the RegexOptions. Multiline option, the match can also occur at the end of a line.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

How do you check if a string ends with a number in Python?

The endswith() method returns True if the string ends with the specified value, otherwise False.


2 Answers

You've tried all the variations except the one that works. The $ goes at the end of the pattern. Also, you'll want to escape the period so it actually matches a period (usually it matches any character).

r1 = re.compile(r"\.pdf$")

However, an easier and clearer way to do this is using the string's .endswith() method:

if filename.endswith(".pdf"):
    # do something

That way you don't have to decipher the regular expression to understand what's going on.

like image 56
kindall Avatar answered Sep 30 '22 23:09

kindall


Behaviour of re.match() and re.search()

There is one significant difference: re.match() checks the beginning of string, you are most likely looking for re.search().

Comparison of both methods is clearly shown in the Python documentation chapter called "search() vs. match()"

Special characters in regular expression

Also the meaning of characters in regular expressions is different than you are trying to use it (see Regular Expression Syntax for details):

  • ^ matches the beginning:

    (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.

  • $ matches the end:

    Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

Complete answer

The solution you are looking for may be:

import re
r1 = re.compile("\.pdf$")  # regular expression corrected
if r1.search("spam.pdf"):  # re.match() replaced with re.search()
    print "yes"
else:
    print "no"

which checks, if the string ends with ".pdf". Does the same as kindall's answer with .endswith(), but if kindall's answer works for you, choose it (it is cleaner as you may not need regular expressions at all).

like image 31
Tadeck Avatar answered Oct 01 '22 01:10

Tadeck