Can a regular expression match whitespace or the start of a string?
I'm trying to replace currency the abbreviation GBP with a £ symbol. I could just match anything starting GBP, but I'd like to be a bit more conservative, and look for certain delimiters around it.
>>> import re >>> text = u'GBP 5 Off when you spend GBP75.00' >>> re.sub(ur'GBP([\W\d])', ur'£\g<1>', text) # matches GBP with any prefix u'\xa3 5 Off when you spend \xa375.00' >>> re.sub(ur'^GBP([\W\d])', ur'£\g<1>', text) # matches at start only u'\xa3 5 Off when you spend GBP75.00' >>> re.sub(ur'(\W)GBP([\W\d])', ur'\g<1>£\g<2>', text) # matches whitespace prefix only u'GBP 5 Off when you spend \xa375.00'
Can I do both of the latter examples at the same time?
Yes, the dot regex matches whitespace characters when using Python's re module. What is this? The dot matches all characters in the string --including whitespaces. You can see that there are many whitespace characters ' ' among the matched characters.
\s stands for “whitespace character”. Again, which characters this actually includes, depends on the regex flavor. In all flavors discussed in this tutorial, it includes [ \t\r\n\f]. That is: \s matches a space, a tab, a carriage return, a line feed, or a form feed.
An empty regular expression matches everything.
Use the OR "|
" operator:
>>> re.sub(r'(^|\W)GBP([\W\d])', u'\g<1>£\g<2>', text) u'\xa3 5 Off when you spend \xa375.00'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With