Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The dollar sign in Python regular expressions

Tags:

python

regex

I am working on a small program with the purpose of findings website URLs ( it is the simplest you could possibly imagine though ). Here's how a relevant portion of it looks like :

webURLregex = re.compile(r'''(
   (https://|http://)
   ([a-zA-Z0-9.%+\\/_-]+)
   ([a-zA-Z0-9%+\\/_-]$)
   )''',re.VERBOSE)

Despite the ''findall'' method I used to search the pasted string, the program gives me only one result, despite the copied text consisting of over 5 URLs. When I delete the dollar sign, it works properly.

I do understand that the dollar sign is unnecessary because the aim of the line it is placed within is only to not match the pasted string's last character if it happens to be a comma or a dot, but I thought the dollar sign could not change the output at all, and apparently it did.

Out of six results I got by running the without-dollar-sign program, only one remains ( for an unknown reasons considering they all look the same schematically ) when I add the dollar sign. I tried adding it right after the closing bracket of the parentheses as well, and the output is one string as well.

Any idea about how and why this occurs would be appreciated.

Thanks in advance.

like image 922
WilliamFrog8 Avatar asked May 18 '26 08:05

WilliamFrog8


1 Answers

You want to use \$ instead of $.

webURLregex = re.compile(r'''(
   (https://|http://)
   ([a-zA-Z0-9.%+\\/_-]+)
   ([a-zA-Z0-9%+\\/_-]\$)
   )''',re.VERBOSE)
like image 167
Janith Avatar answered May 20 '26 22:05

Janith



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!