Can anybody please tell me why the string comparisons below deliver these results?
>>"1040"<="12000"
True
>> "1040"<="10000"
False
I've tried the string comparison in both C and Python, the result is obviously correct, I just can't figure out how the result is calculated...
P.S.: I know that comparing strings of different length is something you shouldn't do, but I'm still wondering about the logic behind the above lines ;-)
"1" is equal to "1".
"0" comes before "2" (so "1040" < "12000").
"4" comes after "0" (so "1040" > "10000").
The fancy word here describing this ordering is "lexicographical order" (and sometimes "dictionary order"). In everyday language we just refer to it as "alphabetical order". What this means is that we place first an ordering on our alphabet (A
, B
, ... Z
, etc.) and then to compare two words over this alphabet we compare one character at a time until we find two non-equal characters in the same position and return the comparison between these two characters.
Example: The "natural" ordering on the alphabet { A, B, C, ..., Z }
is that A < B < C < ... < Z
. Given two words s = s_1s_2...s_m
and t = t_1t_2...t_n
we compare s_1
to t_1
. If s_1 < t_1
we say that s < t
. If s_1 > t_1
we say that s > t
. If s_1 = t_1
we recurse on the words s_2...s_m
and t_2...t_n
. For this to work we say that the empty string is less than all non-empty strings.
In the old days, before Unicode and the like, the ordering on our symbols was just the ordering for the ASCII character codes. So then we have 0 < 1 < 2 < ... < 9 < ... < A < B < C < ... Z < ... < a < b < c < ... < z
. It's more complicated in the days of Unicode, but the same principle applies.
Now, what all this means is that if we want to compare 1040
and 12000
we would use the following:
1040
compare to 12000
is equal to 040
compare to 2000
which gives 040 < 2000
because 0 < 2
so that, finally, 1040 < 12000
.
1040
compare to 10000
is equal to 040
compare to 0000
is equal to 40
compare to 000
which gives 40 > 000
because 4 > 0
so that, finally, 1040 > 10000
.
The key here is that these are strings and do not have a numerical meaning; they are merely symbols and we have a certain ordering on our symbols. That is, we could achieve exactly the same answer if we replaced 0
by A
, 1
by B
, ..., and 9
by J
and said that A < B < C < ... < J
. (In this case we would be comparing BAEA
to BAAAA
and BAEA
to BCAAA
. )
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With