#!/usr/bin/env bash
echo 'Using conditional expression:'
[[ ' ' < '0' ]] && echo ok || echo not ok
[[ ' a' < '0a' ]] && echo ok || echo not ok
echo 'Using test:'
[ ' ' \< '0' ] && echo ok || echo not ok
[ ' a' \< '0a' ] && echo ok || echo not ok
The output is:
Using conditional expression:
ok
not ok
Using test:
ok
ok
bash --version
: GNU bash, version 4.2.45(1)-release (x86_64-pc-linux-gnu)
uname -a
: Linux linuxmint 3.8.0-19-generic
You should not use == (equality operator) to compare these strings because they compare the reference of the string, i.e. whether they are the same object or not. On the other hand, equals() method compares whether the value of the strings is equal, and not the object itself.
How to Compare Strings Using the <= Operator. The <= operator checks if one string is less than or equal to another string. Recall that this operator checks for two things – if one string is less or if both strings are the same – and would return True if either is true. We got True because both strings are equal.
The equals() method compares two strings, and returns true if the strings are equal, and false if not. Tip: Use the compareTo() method to compare two strings lexicographically.
Bash manual says:
When used with [[, the < and > operators sort lexicographically using the current locale. The test command sorts using ASCII ordering.
This boils down to using strcoll(3) or strcmp(3) respectively.
Use the following program (strcoll_strcmp.c) to test this:
#include <stdio.h>
#include <string.h>
#include <locale.h>
int main(int argc, char **argv)
{
setlocale(LC_ALL, "");
if (argc != 3) {
fprintf(stderr, "Usage: %s str1 str2\n", argv[0]);
return 1;
}
printf("strcoll('%s', '%s'): %d\n",
argv[1], argv[2], strcoll(argv[1], argv[2]));
printf("strcmp('%s', '%s'): %d\n",
argv[1], argv[2], strcmp(argv[1], argv[2]));
return 0;
}
Note the difference:
$ LC_ALL=C ./strcoll_strcmp ' a' '0a'
strcoll(' a', '0a'): -16
strcmp(' a', '0a'): -16
$ LC_ALL=en_US.UTF-8 ./strcoll_strcmp ' a' '0a'
strcoll(' a', '0a'): 10
strcmp(' a', '0a'): -16
Exactly why these compare as such I'm not sure. This must be due to some English lexicographical sorting rules. I think the exact rules are described in ISO 14651 Method for comparing character strings and description of the common template tailorable ordering and the accompanying template table. Glibc contains this data in the source tree under libc/localedata/locales
.
The behaviour that you're observing can be explained by the following from the manual:
bash-4.1 and later use the current locale’s collation sequence and strcoll(3).
You seem to be looking for comparison based on ASCII collation. You can change the behavior by setting either compat32
or compat40
.
$ cat test
shopt -s compat40
echo 'Using conditional expression:'
[[ ' ' < '0' ]] && echo ok || echo not ok
[[ ' a' < '0a' ]] && echo ok || echo not ok
echo 'Using test:'
[ ' ' \< '0' ] && echo ok || echo not ok
[ ' a' \< '0a' ] && echo ok || echo not ok
$ bash test
Using conditional expression:
ok
ok
Using test:
ok
ok
From the manual:
compat32
If set, Bash changes its behavior to that of version 3.2 with respect to locale-specific string comparison when using the ‘[[’ conditional command’s ‘<’ and ‘>’ operators. Bash versions prior to bash-4.0 use ASCII collation and strcmp(3); bash-4.1 and later use the current locale’s collation sequence and strcoll(3).
compat40
If set, Bash changes its behavior to that of version 4.0 with respect to locale-specific string comparison when using the ‘[[’ conditional command’s ‘<’ and ‘>’ operators (see previous item) and the effect of interrupting a command list.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With