In java I can use the regex : \p{Sc}
for detecting currency symbol in text. What is the equivalent in Python?
$ means "Match the end of the string" (the position after the last character in the string).
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation.
Special Regex Characters: These characters have special meaning in regex (to be discussed below): . , + , * , ? , ^ , $ , ( , ) , [ , ] , { , } , | , \ . Escape Sequences (\char): To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ).
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
You can use the unicode category if you use regex
package:
>>> import regex >>> regex.findall(r'\p{Sc}', '$99.99 / €77') # Python 3.x ['$', '€']
>>> regex.findall(ur'\p{Sc}', u'$99.99 / €77') # Python 2.x (NoteL unicode literal) [u'$', u'\xa2'] >>> print _[1] ¢
UPDATE
Alterantive way using unicodedata.category
:
>>> import unicodedata >>> [ch for ch in '$99.99 / €77' if unicodedata.category(ch) == 'Sc'] ['$', '€']
If you want to stick with re, supply the characters from Sc manually:
u"[$¢£¤¥֏؋৲৳৻૱௹฿៛\u20a0-\u20bd\ua838\ufdfc\ufe69\uff04\uffe0\uffe1\uffe5\uffe6]"
will do.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With