Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python 2.6+ str.format() and regular expressions

Tags:

Using str.format() is the new standard for formatting strings in Python 2.6, and Python 3. I've run into an issue when using str.format() with regular expressions.

I've written a regular expression to return all domains that are a single level below a specified domain or any domains that are 2 levels below the domain specified, if the 2nd level below is www...

Assuming the specified domain is delivery.com, my regex should return a.delivery.com, b.delivery.com, www.c.delivery.com ... but it should not return x.a.delivery.com.

import re  str1 = "www.pizza.delivery.com" str2 = "w.pizza.delivery.com" str3 = "pizza.delivery.com"  if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str1): print 'String 1 matches!' if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str2): print 'String 2 matches!' if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}delivery.com$', str3): print 'String 3 matches!' 

Running this should give the result:

String 1 matches! String 3 matches! 

Now, the problem is when I try to replace delivery.com dynamically using str.format...

if (re.match('^(w{3}\.)?([0-9A-Za-z-]+\.){1}{domainName}$'.format(domainName = 'delivery.com'), str1): print 'String 1 matches!' 

This seems to fail, because the str.format() expects the {3} and {1} to be parameters to the function. (I'm assuming)

I could concatenate the string using + operator

'^(w{3}\.)?([0-9A-Za-z-]+\.){1}' + domainName + '$' 

The question comes down to, is it possible to use str.format() when the string (usually regex) has "{n}" within it?

like image 690
brildum Avatar asked Dec 09 '09 17:12

brildum


People also ask

What does format () mean in Python?

Definition and Usage The format() method formats the specified value(s) and insert them inside the string's placeholder. The placeholder is defined using curly brackets: {}. Read more about the placeholders in the Placeholder section below. The format() method returns the formatted string.

What is str format in Python?

Python String format() The string format() method formats the given string into a nicer output in Python. The syntax of the format() method is: template.

What is %s and %D in Python?

%s is used as a placeholder for string values you want to inject into a formatted string. %d is used as a placeholder for numeric or decimal values. For example (for python 3) print ('%s is %d years old' % ('Joe', 42)) Would output Joe is 42 years old.


2 Answers

you first would need to format string and then use regex. It really doesn't worth it to put everything into a single line. Escaping is done by doubling the curly braces:

>>> pat= '^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com') >>> pat '^(w{3}\\.)?([0-9A-Za-z-]+\\.){1}delivery.com$' >>> re.match(pat, str1) 

Also, re.match is matching at the beginning of the string, you don't have to put ^ if you use re.match, you need ^ if you're using re.search, however.

Please note, that {1} in regex is rather redundant.

like image 64
SilentGhost Avatar answered Sep 21 '22 17:09

SilentGhost


Per the documentation, if you need a literal { or } to survive the formatting opertation, use {{ and }} in the original string.

'^(w{{3}}\.)?([0-9A-Za-z-]+\.){{1}}{domainName}$'.format(domainName = 'delivery.com') 
like image 44
Jonathan Feinberg Avatar answered Sep 20 '22 17:09

Jonathan Feinberg