The Python 3 documentation for isdigit
says
Return true if all characters in the string are digits and there is at least one character, false otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal.
So it sounds like isdigit
should be a superset of isdecimal
. But then the docs for isdecimal
say
Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those from general category “Nd”. This category includes digit characters, and all characters that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.
That sounds like isdecimal
should be a superset of isdigit
.
How are these methods related? Does one of them match a strict superset of what the other matches? Does the Numeric_Type property even have anything to do with the Nd category? (And is this contradictory documentation a documentation bug?)
The isdigit() method accepts only decimals, subscripts, and superscripts. Python isdecimal() – If all of the characters in a string are decimal characters, this function returns True, else it returns False. The isdecimal() method accepts only decimals.
isdecimal() vs isdigit() vs isnumeric()isdecimal() method supports only Decimal Numbers. isdigit() method supports Decimals, Subscripts, Superscripts. isnumeric() method supports Digits, Vulgar Fractions, Subscripts, Superscripts, Roman Numerals, Currency Numerators.
The Python isnumeric method has a number of key differences between the Python isdigit method. While the isidigit method checks whether the string contains only digits, the isnumeric method checks whether all the characters are numeric.
The isdecimal() method returns True if all the characters are decimals (0-9). This method is used on unicode objects.
As I found out, the correspondence between string predicates checking for a numeric value and Unicode character properties is the following:
isdecimal: Nd,
isdigit: No, Nd,
isnumeric: No, Nd, Nl,
isalnum: No, Nd, Nl, Lu, Lt, Lo, Lm, Ll,
E.g., ᛰ (RUNIC BELGTHOR SYMBOL, U+16F0) belongs to Nl
, therefore:
'ᛰ'.isdecimal() # False
'ᛰ'.isdigit() # False
'ᛰ'.isnumeric() # True
'ᛰ'.isalnum() # True
The way I read section 4.6 of the Unicode 6.0 standard, the digit category is a superset of the decimal digit category.
Decimal digits, as commonly understood, are digits used to form decimal-radix numbers. They include script-specific digits, but exclude characters such as Roman numerals and Greek acrophonic numerals, which do not form decimal-radix expressions. (Note that <1, 5> = 15 = fifteen, but = IV = four.)
The Numeric_Type=decimal property value (which is correlated with the General_Category=Nd property value) is limited to those numeric characters that are used in decimal-radix numbers and for which a full set of digits has been encoded in a contiguous range, with ascending order of Numeric_Value, and with the digit zero as the first code point in the range.
So the decimal category would exclude digit types such as Roman numerals, fractions, etc.
The Python 3 documentation for str.isdecimal appears to have been corrected so it no longer says that decimals include digits:
str.isdecimal
Return true if all characters in the string are decimal characters and there is at least one character, false otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category “Nd”.
The Python 2 documentation still appears to be wrong (doesn't match the 2.7.14 implementation) and consistently states that decimals include digits:
str.isdigit
Return true if all characters in the string are digits and there is at least one character, false otherwise. For 8-bit strings, this method is locale-dependent.
unicode.isdecimal
Return True if there are only decimal characters in S, False otherwise. Decimal characters include digit characters, and all characters that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.
A quick test of the character '³' in Python 2.7.14 shows that decimals do not include digits:
>>> u'\u00b3'.isdecimal()
False
>>> u'\u00b3'.isdigit()
True
Python 2 and 3 now have similar behavior (digits include decimals) matching the Python 3 documentation, whereas the Python 2 documentation is wrong.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With